Parallel Processing Help

Hello,

I am wondering if it is possible to break up one big loop into a number of smaller loops (depending on the number of cores in the system) and have each loop executed in parallel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
		for(short i=0; i<12; i++)     // big loop
		{
			.....
				myVariable+=resultsFromWorkout;
		}

		//*****************

		for(short i=0; i<3; i++)
		{
			....
				myVariable1+=resultsFromWorkout;
		}

		for(short i=3; i<6; i++)
		{
			....
				myVariable2+=resultsFromWorkout;
		}

		for(short i=6; i<9; i++)
		{
			....
				myVariable3+=resultsFromWorkout;
		}

		for(short i=9; i<12; i++)
		{
			....
				myVariable4+=resultsFromWorkout;
		}

		myVariable=myVariable1+myVariable2+myVariable3+myVariable4;


As shown in the above snippet, the big loop basically accumulates the results in my variable, assuming we have four cores, the big loop is broken into 4 smaller loops each smaller loop accumulating the results.

My question is what is the correct syntax to do something like that.
Thanks in advance
Last edited on
Not for small operations. The exchange costs between the cores will be greater than the calculation gains.
It will depend on what is in your ....
Actually the "..." is very large bloc of code, each loop is about 12 mins :). And as i have mentioned each loop is independent, so any factor increase in speed would be much cherished.
Yes, it can be done as long as each FOR loop can run independently from the others, meaning that any given loop doesn't need data calculated in other loops.

In the end, after all loops complete, you must synchronize at least by waiting on the worker threads to calculate the final result.
Thanks webJose. Yes each loop is independent from the others. Would be able to elaborate a little more on the syntax, like provide me with some sort of template. I have no idea where to start. Let's say we have the following simple trivial example.
1
2
3
4
5
6
7
8
9
10
double MyFunc(double myArray[])
{
double summation=0;

for(short i=0; i<12; i++)
summation+=myArray[i];

return summation;
}


How we would rewrite this function. Thanks.
Last edited on
Maybe you want to take a look at multi-threading. <pthread.h> would be useful.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
struct ArrayData
{
    int *theArray;
    int lowerBound;
    int upperBound;
    int sum;
};

int Calculate(void *lpData)
{
    ArrayData &data = reinterpret_cast<ArrayData&>(*lpData);
    data.sum = 0;
    for (int i = data.lowerBound; i <= data.upperBound; i++)
        data.sum += data.theArray[i];
    return 0;
}

//Now you need to create one thread per core.
//I only program for Windows, so I don't know about pthread.h or *nix-based threading.

//This is an array of ArrayData structures that hold the data and the results.
//Needs to be accessible from several functions, so I declare it here.
ArrayData *workingSet;
int workingSetSize;
//Windows-specific:  Thread handles for later.
HANDLE *threadHandles;

const int totalElements = 12;

void StartWorkers()
{
// See http://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine for the number of cores in a PC.

    SYSTEM_INFO si;
    GetSystemInfo(&si);
    workingSetSize = si.dwNumberOfProcessors;
    workingSet = new ArrayData[workingSetSize];
    workingSet[0].theArray = someArray; //Dunno your array variable, so substitute here as appropriate.
    workingSet[0].lowerBound = 0;
    workingSet[0].upperBound = totalElements / workingSetSize;
    int i = 1;
    while (i < workingSetSize)
    {
        workingSet[i].theArray = someArray; //Same as a few lines above.
        workingSet[i].lowerBound = workingSet[i - 1].upperBound + 1;
        workingSet[i].upperBound = workingSet[i].lowerBound + totalElements / workingSetSize;
    }
    // This can be improved.  Up to you.
    workingSetSize[workingSetSize - 1].upperBound += totalElements % workingSetSize;

    //Now the data for each thread is ready.  Start the threads.
    //This is Windows cuz I know Windows only.
    //If you use MS C++, use _beginthreadex() instead of CreateThread().
    DWORD dwThreadID;
    threadHandles = new HANDLE[workingSetSize];
    for (i = 0; i < workingSetSize; i++)
        threadHandles[i] = CreateThread(NULL, 0, &Calculate, &workingSet[i], 0, &dwThreadID);
}

....
//Now in the main function or some other function, after calling StartWorkers(), it must wait for
//all threads to finish.

    //Again, Windows-specific:
    WaitForMultipleObjects(workingSetSize, threadHandles, TRUE, INFINITE);
....
Last edited on
Thanks very much for your reply. I will give it a shot and let you know, thanks again.
Topic archived. No new replies allowed.