simplify, debug.
first, checker serves no purpose.
for()
{
stuff()
if something, return true.
}
got here, above never happened, return false.
that becomes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
{
constauto nom = number_of_monomers();
constauto onom = other.number_of_monomers();
auto bd1 = allocate_bd(); auto bd2 = allocate_bd(); //allocating memory
for (int32_t i = 0; i < nom; i++)
{
get_mcoord(i, &bd1);
for (int32_t j = 0; j < onom; j++)
{
get_mcoord(j, &bd2);
if (check_gjk_intersection(&bd1, &bd2))
returntrue;
}
}
free_bd(&bd1); free_bd(&bd2); //it would be nice if this were automatic (smart pointer) ...
//if you do not use a smart pointer to fix this you need to free it on the return true also
returnfalse;
}
then debug it.
- set up the smallest problem you can where the things do NOT intersect.
- run it, see if you get false.
- if you get true, step through it to see where that happens, what exactly it compared or how it was incorrect.
I don't see anything in your pointers and the only reference I saw was other -- it seems correct on the surface.
you don't have to simplify it of course, its just that much less clutter when debugging. Up to you on that.
But I guess that on line 7 bd1 and bd2 are already pointer? Then on line 12/15/16 you have pointer to pointer which would be wrong, but the compiler should have complained. So it's hard to tell.
Thank you both very much. I did not simplify it because the function was intended for pragma use and I thought it is related with memory management. However as soon as I started to simplify it, I suddenly saw the obvious mistake as always (other.get_mcoord in the second loop). Sorry for that. But now I really need the actual help - I heard that memory management when using pragma operatives are more complex, one must allocate memory for each thread. Could someone maybe show the example of how exactly it should be properly done? Actually it works now as it is, but probably when allocating the memory for each thread it will work even more efficient?Or I am mistaken?
I have not done threading via pragmas -- not sure at all what these things do.
but ... memory should be allocated and freed as little as is possible. If you can re-use it (example: you know the max size and it is not so big that allocating it is bad, or if its the same size every time, etc) then do so.
you can lose a TON of time if you call a function in a tight loop and the function allocates memory, does a little work, then frees the memory, it can spend half or more of its time in the memory management!
Another way to handle this is to pass the memory around outside the inner logic. 'here, use this buffer' style. Then you can manage the memory at a higher level and make better decisions than you can do locally where you can't see what anything else is doing.
Maybe those will help you think through it, but I have nothing on how to use these tools optimally.
do you have a real use case for the code that you can see if it seems slow? Or are you trying to wring every last nanosecond out of it? If it isn't too slow for some requirement, and it works, move on.
Thanks. It is a real use case : I am generating tons of the molecules and I need to do them really fast, and the whole code will be used on the supercomputer so parallelism is a must.
So when in parallel, we need to allocate the memory for each thread separately:
OK.
you could allocate a giant block of 2*nthreads and pull pointers out of that instead but I dunno that it helps in any way. If its too slow, profile what takes time, focus on that...