OpenMP for nested loops

I am writing a lattice boltzmann solver and to do so I'm using OpenMP for loop parallelization. After using OpenMp not only the execution time is not reduced but also I'm getting wrong answer. I am attaching two functions below in which I used OpenMp. I don't know whether I need to consider any specific criteria for loop parallelization or not. Thanks for your help in advance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
void LBMSOLVER::collision() {

	double t1, t2;

#pragma omp parallel 
	{
		num_omp_threads_ = omp_get_num_threads();
#pragma for num_threads(num_threads_) collapse (2)
		for (int j = 0; j <= Ny_; j++)
			for (int i = 0; i <= Nx_; i++)
			{
				t1 = u_[i][j] * u_[i][j] + v_[i][j] * v_[i][j];
				for (int k = 0; k <= 8; k++)
				{
					t2 = u_[i][j] * cx[k] + v_[i][j] * cy[k];
					feq_[k][i][j] = rho_[i][j] * w[k] * (1.0 + 3.0 * t2 + 4.5 * t2 * t2 - 1.50 * t1);
					f_[k][i][j] = omega * feq_[k][i][j] + (1.0 - omega) * f_[k][i][j];
				}
			}
	}
}

void LBMSOLVER::streaming() {

#pragma omp parallel 
	{
		num_omp_threads_ = omp_get_num_threads();
#pragma for num_threads(num_threads_) 

		for (int j = 0; j <= Ny_; j++)
		{
			for (int i = Nx_; i >= 1; i--) //Right to Left
				f_[1][i][j] = f_[1][i - 1][j];

			for (int i = 0; i <= Nx_ - 1; i++) //Left to Right
				f_[3][i][j] = f_[3][i + 1][j];
		}
		for (int j = Ny_; j >= 1; j--) //Top to Bottom
		{
			for (int i = 0; i <= Nx_; i++)
				f_[2][i][j] = f_[2][i][j - 1];

			for (int i = Nx_; i >= 1; i--)
				f_[5][i][j] = f_[5][i - 1][j - 1];

			for (int i = 0; i <= Nx_ - 1; i++)
				f_[6][i][j] = f_[6][i + 1][j - 1];
		}
		for (int j = 0; j <= Ny_ - 1; j++) //Bottom to Top
		{
			for (int i = 0; i <= Nx_; i++)
				f_[4][i][j] = f_[4][i][j + 1];

			for (int i = 0; i <= Nx_ - 1; i++)
				f_[7][i][j] = f_[7][i + 1][j + 1];

			for (int i = Nx_; i >= 1; i--)
				f_[8][i][j] = f_[8][i - 1][j + 1];
		}
	}
}   
Last edited on
> for (int j = 0; j <= Ny_; j++)
I would check every single one of your loops used to index an array.

If you have
int a[N];
then the correct loop to access each element is
for ( i = 0 ; i < N ; i++ )

Why is there no f_[0][i][j] in your LBMSOLVER::streaming() function?
Arrays are 0-based, not 1-based.
Thanks for the answer. Sorry but I didn't get this
"would check every single one of your loops used to index an array"

There is no f[0][i][j] because or how lattice boltzmann works, it doesn't need to be streamed.
Alright then, show us how you've declared all those arrays (or vectors, or whatever other container you've overloaded [] for).

> There is no f[0][i][j] because or how lattice boltzmann works, it doesn't need to be streamed.
Fine, but did you declare the size of the left dimension to be 9 ?
I think the iterations of the nested loops on line 38 and 49 are dependent.
This may be why you're getting the wrong answer.
Last edited on
@salem c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
typedef std::vector<double>             VecDbl_t;
typedef std::vector<VecDbl_t>           VecVecDbl_t;

typedef std::vector<VecVecDbl_t>        VecVecVecDbl_t;

VecVecVecDbl_t f_,feq_

	for (int k = 0; k < 9; k++) {
		f_[k].resize(Nx_ + 1);
		feq_[k].resize(Nx_ + 1);
		u_.resize(Nx_ + 1);
		v_.resize(Nx_ + 1);
		rho_.resize(Nx_ + 1);

		for (int i = 0; i < Nx_ + 1; i++) {
			f_[k][i].resize(Ny_ + 1);
			feq_[k][i].resize(Ny_ + 1);
			u_[i].resize(Ny_ + 1);
			v_[i].resize(Ny_ + 1);
			rho_[i].resize(Ny_ + 1);

		}
	}

Mmm, OK.
It would seem you have enough +1 things in place.

However,
> for (int k = 0; k <= 8; k++)
> f_[k][i][j] = ..

You then go on to say

There is no f[0][i][j] because or how lattice boltzmann works, it doesn't need to be streamed.

Your f_[8] matrix in LBMSOLVER::streaming() would seem to contain just a whole bunch of nothing.

@salem c why f_[8][i][j] seems to have bunch of nothing?
Because I was wrong, and got blinded by your mixing of <, <= and changing limits in your awful to read code.

@salem c the code is easy to read. I almost used <= everywhere Expect variable initialization, you haven't noticed any of them?!
Topic archived. No new replies allowed.