Best disregard what I was saying in my first post. I didn't understand
#pragma omp parallel
Sorry.
I don't quite understand why I can't use a single shared `result` variable in the loop rather than the array. |
Now that I know a little better, it's because this line of code is non-atomic: it can be interrupted in the middle of its execution.
results += 4 / (1 + x_start * x_start) * dx;
To illustrate why this is a problem, imagine running this snippet of code:
1 2 3
|
int x = 0;
#pragma omp parallel num_threads(2)
{ x = x + 1; }
|
A possible flow of execution looks like this:
Thread A reads x, gets 0 (but does not write back to x yet)
Thread B preempts thread A.
Thread B reads x, gets 0.
Thread B writes 0 + 1 into x.
Thread A preempts thread B.
Thread A has already read x and found it to be 0.
Thread A writes 0 + 1 into x. |
At the end of the process x still contains 1. This is a symptom of a
data race.
x_start should only ever be modified by its corresponding rank, no? |
Yes. The race condition only affects
result, not
x_start.
What's the difference between "normal" C++ threading and OpenMP? I thought one needed OpenMP for multithreading just like one would use MPI for multiprocessing. |
Threads are an operating system feature, so the OS lets you control them. Both the C++ library and OpenMP wrap up these system-specific features for use in portable code.
Relative to the C++ library, OpenMP is a "high-level" API. It hides some details, does lots in the background. It may also be that OpenMP offers less control over the system's resources in exchange for convenience, but I don't have enough info to say for sure.