Well the first problem with the
return true is that you're leaking the bd allocations by not calling
free_bd.
> the actual function should run on parallelism with pragma, etc so more complicated
Before you even think about that, you need to make sure the code is fully functional on small (but representative) data sets as normal sequential code.
You're already struggling with debugging sequential code, premature parallelism is just making it impossible for you to reason about anything you observe.
https://www.cplusplus.com/forum/beginner/278284/
Your level of parallelism seems far too detailed.
polymer_intersecting doesn't need any parallelism, it just needs to be re-entrant.
That is, it depends only on it's parameters and internal variables, and doesn't write to any global data.
Allocating memory is fine, so long as you release it.
If you try to micro-manage this at too low a level, you're just going to thrash the system with expensive (compared to the actual amount of work done) context switches.
Then I would add just
one #pragma omp parallel to the for loop in
_check_polymer_intersection
That is, each parallel thread is responsible for checking an entire polymer for an intersection.
Then you measure the performance to see how much of an improvement (if any).
Give each thread a useful amount of work to do, then you only pay the overhead price of thread management once.