I wrote a raytracing program that takes a very very long time to finish, so I have been trying to multithread it using pthread_create(). It works perfectly for small test images, but it stops working once I increase the resolution. With some couts I've found that it seems to stop making new threads after a certain point. Does anyone know why that might be the case?
What I want to do is this: I want to create a number of threads equal to the number of processors on the computer, and I want each of them to get assigned a ray to trace. Then, if at any point one of them finishes tracing their ray, I'd like that thread to be assigned the next untraced ray. Since the amount of time it takes to trace a single ray can vary vastly depending on what kind of stuff it hits, I want to make sure that I allow each thread to fetch a new ray whenever it finishes the one it is working on, not whenever all of them finish what they're working on.
Here's sort of an overview of what it's doing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
|
struct initializerArgs{
Ray r;
int x, y, threadID, rayCount;
};
//global array of booleans regarding whether the system is computing this pixel or not
bool isComputing[IMAGEWIDTH][IMAGEHEIGHT];
//the finalized image file
Color image[IMAGEWIDTH][IMAGEHEIGHT];
Color raytrace (Ray ri) {
Color output;
//a whole lotta math that takes a very long time to compute and sets output
return output;
}
void* threadInitializer(void* args) {
initializerArgs* data = (static_cast<initializerArgs*>(args));
// ============ COUT A ============
cout << "inside thread " << data->threadID << " computing ray # " << data->rayCount << endl;
image[data->x][data->y] = raytrace(data->r);
//all computations are done
isComputing[data->x][data->y] = false;
pthread_exit(NULL);
}
int main (int argc, char** argv) {
//This tells me how many CPUs are attached to the computer. Output has been verified.
const int cpus = sysconf(_SC_NPROCESSORS_ONLN);
//create an array of threads equal to the number of CPUs.
pthread_t threads[cpus];
//arrays telling the program what x and y values are being computed by each thread.
int xt[cpus];
int yt[cpus];
//array of arguments to initialize the threads with.
initializerArgs args[cpus];
int i, j, k, xPtr, yPtr, rayCount;
bool check = true;
//since there are always many more height pixels than CPUs, preset wt and ht to the first pixels in the column.
for (i = 0; i < cpus; i++) {
xt[i] = 0;
yt[i] = i;
}
//these values traverse the image concurrently with rayCount.
xPtr = 0;
yPtr = 0;
rayCount = 0;
while (check) {
//check is set to false now, then set to true for any event that tells the system that the raytracer still has work to do
check = false;
for (i = 0; i < cpus; i++) {
if (rayCount == IMAGEWIDTH * IMAGEHEIGHT) {
//if there are no more rays to assign, just check if the processes are done computing yet, set check to true if not
if (isComputing[xt[i]][yt[i]]) {
check = true;
}
} else {
//since there are more rays to assign, we know the system isn't done yet.
check = true;
if (!iscomputing[xt[i]][yt[i]]) {
//if this thread is not computing anything right now
args[i].r = getRay();
args[i].x = xPtr;
args[i].y = yPtr;
args[i].threadID = i;
args[i].rayCount = rayCount;
// ============ COUT B ============
cout << "beginning thread " << i << " computing ray # " << rayCount << endl;
pthread_create(&threads[i], NULL, threadInitializer, &args[i]);
rayCount++;
yPtr++;
if (yPtr == IMAGEHEIGHT) {
yPtr = 0;
xPtr++;
}
}
}
}
}
}
|
So, with IMAGEHEIGHT and IMAGEWIDTH set to a low value, I get a chain of "beginning thread [a number 0-11, since this computer has 12 CPUs] computing ray # [increasing sequentially from 0 to IMAGEWIDTH * IMAGEHEIGHT]" interspersed with the "inside thread" versions of those lines. Both report rayCount values all the way up to IMAGEWIDTH * IMAGEHEIGHT - 1, like you would expect. However, with a larger image, I get the "beginning thread" version all the way up to IMAGEWIDTH * IMAGEHEIGHT, but I only get "inside thread" versions up to about 32745.
That would seem to tell me it's not creating more threads after that point, even though I'm limiting the number of threads the program makes to 12 at a time. Supporting this idea is the fact that the number it stops at is suspiciously close to 32768, which is a power of 2 value, so it might be some kind of hard limit on the number of threads I can create. Am I doing something wrong here with the way I close out my threads?