Please forgive the desperate nature of this post, but I've just had a very nasty experience running a C++ DLL on a production server called from a C# application.
It is called from multiple threads, and has worked fine for weeks. I noticed the old Windows 2003 server I ran it on had problems with OpenMP, and other more recent options, so I disabled them in the compiler options.
Anyway, I looked at the server today and it was at 100% CPU where it is normally at 6%. When I removed the DLL and ran the old C# code, the problem went away.
In this case, it hurt the performance of my application and it cost the business a good amount of money, and I cannot afford for it to happen again.
Is there a problem with the way I am allocating and de-allocating memory here that anyone can see? Or anything else that looks suspicious? Is it possible the OpenMP directives are not completely being ignored by the compiler?
Thanks so much for any input...
extern "C" __declspec(dllexport) void RecursePlacesFast(double probs[], int placesLeft, double place[], int len)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);
#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
memset(copy, 0, sizeof(double)* len);
memset(used, 0, sizeof(bool)* len);
Why are you dynamically allocating memory for single instances of variables? Not that it has anything to do with your issue, that may cause poor performance through disk thrashing but not CPU spiking. You have to use an analyser to see where your code is taking up the most time. Even something as simple as Process Explorer.
Would I be better using something else to allocate a dynamic array? My understanding was that C++ can only allocate static arrays.
If I know I'll never have more than 100 values in copy, would I be better off with...
copy = new double[100]
...and only using the indexes I need?
I have found one pretty major problem on second look, although with OpenMP turned off it shouldnt have caused this. This is what it should look like to avoid multiple threads manipulating the same memory
#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);
To be brutally honest OP, your code here is a tare down and I'm not even going to go over it other then to say that if you value your job you should not be using this in any environment where a colleague might see it.
In the interest of answering your question though memory allocation is not likely to be the cause of your issue. You need to find a profiler and see what is actually spiking your CPU. It might just be that a server running 2003 doesn't have the kind of processing power you need since it's possibly < 10 years old.
There isn't much of it so please feel free to point out what you would put up after the tare down!
I allocate memory within the loop, manipulate it, then deallocate. I have to do this within the loop because it is marked for parallel in OpenMP, which I think was my problem. What's wrong with that?
#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);
What happens if you add num_threads(2) to the OpenMP pragma and run it on the server? High core counts are a relatively recent innovation, and I'd imagine the cost of thread context switching was once higher than it is now.