I have a piece of code that executes over several threads in OpenMP. Basically I have to optimise a number of "species" that are stored in a vector called vector<Species*> species_vec_shuffled. The size of this vector exceeds the number of threads.
If any threads are left waiting once all available species are being optimised, I want the thread to simply keep working on the current species rather than just waiting. Here is how I have tried to achieve this (simplified for demonstration sake):
void Trainer<Scheme>::Epoch() {
/*
* species_optimisation_started_vec / species_optimisation_ended_vec:
* Vectors which store a list of booleans indicating whether
* a given OpenMP thread has begun and ended running optimisation
* generations on a species, whose index in each vector corresponds
* to the index in species_vec_shuffled.
*/
vector<bool> species_optimisation_started_vec(p_world->species_vec_shuffled.size(),false);
vector<bool> species_optimisation_ended_vec(p_world->species_vec_shuffled.size(),false);
#pragma omp parallel for schedule(dynamic,1)
for (int l = 0; l < static_cast<int>(p_world->species_vec_shuffled.size()); l++) {
species_optimisation_started_vec[l] = true; // flag the current species optimisation as started
// run optimisation until some convergence criterion is met...
species_optimisation_ended_vec[l] = true; // flag the current species optimisation as ended
/*
* At this point optimisation has ended. Check to see if all other
* species have already been started, but not all have finished.
* If this is the case, just run generations until it is no longer
* the case (we have spare time on this thread, so might as well use
* it)
*/
while (
find(species_optimisation_started_vec.begin(),species_optimisation_started_vec.end(),false) == species_optimisation_started_vec.end() && // all generations have started, but...
find(species_optimisation_ended_vec.begin(),species_optimisation_ended_vec.end(),false) != species_optimisation_ended_vec.end()) // ... not all generations have finished
{
// continue running steps (these are very short)
}
}
}
This code runs fine, but stalls after a couple of seconds. I suspect it might be getting stuck in the while loop at the bottom but I don't understand how this is logically possible. Maybe someone with experience in parallel threads could help with you this might happen and how I can avoid it? Thanks!
If I understood you correctly, what you want to do is this:
1. Create n threads.
2. Assign to each thread one species to process. Keep a list of species not yet processed.
3. Every time a thread finishes with a species, it will remove one element from the list and process it.
4. The code finishes when all threads have finished and the list is empty.
I don't know how to do that in OMP, but what you do is have a JobManager class that holds a thread-safe queue and pass a JobManager pointer to each thread. The class should have a function that returns a job description for the thread corresponding to the top of the queue, or a bit informing the thread that the queue is empty.
Finally, the main thread just waits until all worker threads return.
1) provide a list of species which need optimising
2) create n threads
3) assign each thread a species to optimise
4) if any thread finds that all species have been allocated but not all have finished being optimised up to the convergence criterion, continue optimising the current species in the thread past the convergence criterion until all other species have been optimised.
I think what you are describing is essentially what I am doing, except that I'm not explicitly creating a class, but using 2 vectors to achieve (at least as far as I can tell) the same effect.
Ah, OK. That makes more sense.
I don't think the while at the end would give much of a theoretical advantage. Suppose you have a list of 10000 elements and OMP decides to use 5 threads. The first 9995 elements will be computed to standard convergence, since processing of not all elements has started. Once one thread finishes with the fifth-to-last element, it will continue running until the next four elements are finished. The fourth-to-last thread will run a little less, and so on.
The advantage of that loop, assuming it works as you intended (which I can't determine. It looks like it should, but concurrency is hard to predict), is marginal at best. Try commenting it out and see what happens.
The number of operations required to achieve convergence varies by several orders of magnitude between species, so commenting out the while loop actually reduces CPU consumption from 100% to around 40%. I seem to have solved the problem though, by adding critical statements to the vector modifications. I don't understand why this fixes it, but it seems to work:
void Trainer<Scheme>::Epoch() {
/*
* species_optimisation_started_vec / species_optimisation_ended_vec:
* Vectors which store a list of booleans indicating whether
* a given OpenMP thread has begun and ended running optimisation
* generations on a species, whose index in each vector corresponds
* to the index in species_vec_shuffled.
*/
vector<bool> species_optimisation_started_vec(p_world->species_vec_shuffled.size(),false);
vector<bool> species_optimisation_ended_vec(p_world->species_vec_shuffled.size(),false);
#pragma omp parallel for schedule(dynamic,1)
for (int l = 0; l < static_cast<int>(p_world->species_vec_shuffled.size()); l++) {
#pragma omp critical (flag_species_optimisation_started)
species_optimisation_started_vec[l] = true; // flag the current species optimisation as started
// run optimisation until some convergence criterion is met...
#pragma omp critical (flag_species_optimisation_ended)
species_optimisation_ended_vec[l] = true; // flag the current species optimisation as ended
/*
* At this point optimisation has ended. Check to see if all other
* species have already been started, but not all have finished.
* If this is the case, just run generations until it is no longer
* the case (we have spare time on this thread, so might as well use
* it)
*/
while (
find(species_optimisation_started_vec.begin(),species_optimisation_started_vec.end(),false) == species_optimisation_started_vec.end() && // all generations have started, but...
find(species_optimisation_ended_vec.begin(),species_optimisation_ended_vec.end(),false) != species_optimisation_ended_vec.end()) // ... not all generations have finished
{
// continue running steps (these are very short)
}
}
}