using cache and parallelize for cicle

Hi, I am quite new to c++ and my teacher tell me a question I can't answer.

for (i=0; i<Nu; i++)
{
for (j=0; j<Nu; j++)
{
*(C+(i*Nu+j))=0.0;
#pragma omp parallel for
for(k=0;k<Nu ;k++)
{
*(C+(i*Nu+j)) += *(A+(i*Nu+k)) * *(D+(j*Nu+k));
}
}
}

Is the cache used well?
The openmp directive is in the right position?
if yes why? if not where it has to be and why?

thank you and sorry for my english..
no one?
Time the program with and without the #pragma and try moving it around. This is standard practice when testing optimizations. Once you have a baseline timing without optimization you can assess each optmization separately and see the effect.

Once you have the timings, you need to _explain_ why you have different timings.

The answer to both your questions are related. That's as much as I will say given it's an assignment.

btw, that's a very interesting #pragma, can you tell me which compiler you are using? I guess it's Intel icc, would be super interested if it's gcc/g++.
Last edited on
It's a part of a g++ program to do moltiplication of matrix.
Topic archived. No new replies allowed.