Oct 24, 2022 at 12:01am UTC
I am writing a lattice boltzmann solver and to do so I'm using OpenMP for loop parallelization. After using OpenMp not only the execution time is not reduced but also I'm getting wrong answer. I am attaching two functions below in which I used OpenMp. I don't know whether I need to consider any specific criteria for loop parallelization or not. Thanks for your help in advance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
void LBMSOLVER::collision() {
double t1, t2;
#pragma omp parallel
{
num_omp_threads_ = omp_get_num_threads();
#pragma for num_threads(num_threads_) collapse (2)
for (int j = 0; j <= Ny_; j++)
for (int i = 0; i <= Nx_; i++)
{
t1 = u_[i][j] * u_[i][j] + v_[i][j] * v_[i][j];
for (int k = 0; k <= 8; k++)
{
t2 = u_[i][j] * cx[k] + v_[i][j] * cy[k];
feq_[k][i][j] = rho_[i][j] * w[k] * (1.0 + 3.0 * t2 + 4.5 * t2 * t2 - 1.50 * t1);
f_[k][i][j] = omega * feq_[k][i][j] + (1.0 - omega) * f_[k][i][j];
}
}
}
}
void LBMSOLVER::streaming() {
#pragma omp parallel
{
num_omp_threads_ = omp_get_num_threads();
#pragma for num_threads(num_threads_)
for (int j = 0; j <= Ny_; j++)
{
for (int i = Nx_; i >= 1; i--) //Right to Left
f_[1][i][j] = f_[1][i - 1][j];
for (int i = 0; i <= Nx_ - 1; i++) //Left to Right
f_[3][i][j] = f_[3][i + 1][j];
}
for (int j = Ny_; j >= 1; j--) //Top to Bottom
{
for (int i = 0; i <= Nx_; i++)
f_[2][i][j] = f_[2][i][j - 1];
for (int i = Nx_; i >= 1; i--)
f_[5][i][j] = f_[5][i - 1][j - 1];
for (int i = 0; i <= Nx_ - 1; i++)
f_[6][i][j] = f_[6][i + 1][j - 1];
}
for (int j = 0; j <= Ny_ - 1; j++) //Bottom to Top
{
for (int i = 0; i <= Nx_; i++)
f_[4][i][j] = f_[4][i][j + 1];
for (int i = 0; i <= Nx_ - 1; i++)
f_[7][i][j] = f_[7][i + 1][j + 1];
for (int i = Nx_; i >= 1; i--)
f_[8][i][j] = f_[8][i - 1][j + 1];
}
}
}
Last edited on Oct 24, 2022 at 12:02am UTC
Oct 24, 2022 at 5:09am UTC
> for (int j = 0; j <= Ny_; j++)
I would check every single one of your loops used to index an array.
If you have
int a[N];
then the correct loop to access each element is
for ( i = 0 ; i < N ; i++ )
Why is there no f_[0][i][j] in your LBMSOLVER::streaming() function?
Arrays are 0-based, not 1-based.
Oct 24, 2022 at 7:24am UTC
Alright then, show us how you've declared all those arrays (or vectors, or whatever other container you've overloaded [] for).
> There is no f[0][i][j] because or how lattice boltzmann works, it doesn't need to be streamed.
Fine, but did you declare the size of the left dimension to be 9 ?
Oct 24, 2022 at 1:45pm UTC
I think the iterations of the nested loops on line 38 and 49 are dependent.
This may be why you're getting the wrong answer.
Last edited on Oct 24, 2022 at 1:46pm UTC
Oct 26, 2022 at 12:25am UTC
@salem c why f_[8][i][j] seems to have bunch of nothing?
Oct 26, 2022 at 3:12am UTC
Because I was wrong, and got blinded by your mixing of <, <= and changing limits in your awful to read code.
Oct 26, 2022 at 4:04pm UTC
@salem c the code is easy to read. I almost used <= everywhere Expect variable initialization, you haven't noticed any of them?!