Well the first problem with the return true
is that you're leaking the bd allocations by not calling free_bd
> the actual function should run on parallelism with pragma, etc so more complicated
Before you even think about that, you need to make sure the code is fully functional on small (but representative) data sets as normal sequential code.
You're already struggling with debugging sequential code, premature parallelism is just making it impossible for you to reason about anything you observe.
Your level of parallelism seems far too detailed.
doesn't need any parallelism, it just needs to be re-entrant.
That is, it depends only on it's parameters and internal variables, and doesn't write to any global data.
Allocating memory is fine, so long as you release it.
If you try to micro-manage this at too low a level, you're just going to thrash the system with expensive (compared to the actual amount of work done) context switches.
Then I would add just one #pragma omp parallel
to the for loop in _check_polymer_intersection
That is, each parallel thread is responsible for checking an entire polymer for an intersection.
Then you measure the performance to see how much of an improvement (if any).
Give each thread a useful amount of work to do, then you only pay the overhead price of thread management once.