100% CPU

Hello,

Please forgive the desperate nature of this post, but I've just had a very nasty experience running a C++ DLL on a production server called from a C# application.

It is called from multiple threads, and has worked fine for weeks. I noticed the old Windows 2003 server I ran it on had problems with OpenMP, and other more recent options, so I disabled them in the compiler options.

Anyway, I looked at the server today and it was at 100% CPU where it is normally at 6%. When I removed the DLL and ran the old C# code, the problem went away.

In this case, it hurt the performance of my application and it cost the business a good amount of money, and I cannot afford for it to happen again.

Is there a problem with the way I am allocating and de-allocating memory here that anyone can see? Or anything else that looks suspicious? Is it possible the OpenMP directives are not completely being ignored by the compiler?

Thanks so much for any input...

extern "C" __declspec(dllexport) void RecursePlacesFast(double probs[], int placesLeft, double place[], int len)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);

#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
memset(copy, 0, sizeof(double)* len);
memset(used, 0, sizeof(bool)* len);

// do some stuff
}

free(copy);
free(used);
}

adamant (5)

I should add, the same code ran fine on the same inputs on my shiny new i7 Windows 8.1 machine.

Computergeek01 (5613)

Why are you dynamically allocating memory for single instances of variables? Not that it has anything to do with your issue, that may cause poor performance through disk thrashing but not CPU spiking. You have to use an analyser to see where your code is taking up the most time. Even something as simple as Process Explorer.

adamant (5)

Hi. Thanks for the response.

Would I be better using something else to allocate a dynamic array? My understanding was that C++ can only allocate static arrays.

If I know I'll never have more than 100 values in copy, would I be better off with...
copy = new double[100]
...and only using the indexes I need?

I have found one pretty major problem on second look, although with OpenMP turned off it shouldnt have caused this. This is what it should look like to avoid multiple threads manipulating the same memory

#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);

// do some stuff

free(copy);
free(used);
}

Last edited on

Computergeek01 (5613)

To be brutally honest OP, your code here is a tare down and I'm not even going to go over it other then to say that if you value your job you should not be using this in any environment where a colleague might see it.

In the interest of answering your question though memory allocation is not likely to be the cause of your issue. You need to find a profiler and see what is actually spiking your CPU. It might just be that a server running 2003 doesn't have the kind of processing power you need since it's possibly < 10 years old.

adamant (5)

lol

There isn't much of it so please feel free to point out what you would put up after the tare down!

I allocate memory within the loop, manipulate it, then deallocate. I have to do this within the loop because it is marked for parallel in OpenMP, which I think was my problem. What's wrong with that?

#pragma omp parallel for shared(place)
for (int i = 0; i < len; i++)
{
double * copy;
copy = (double *)malloc(sizeof(double)* len);
bool * used;
used = (bool *)malloc(sizeof(bool)* len);

// do some stuff

free(copy);
free(used);
}

Albatross (4553)

What happens if you add num_threads(2) to the OpenMP pragma and run it on the server? High core counts are a relatively recent innovation, and I'd imagine the cost of thread context switching was once higher than it is now.

-Albatross

adamant (5)

Thanks for the suggestion.

I had tried that already, and you are absolutely right that performance degrades steeply beyond 2 threads on this old dog of a dual core opteron 1212.

But in the release with the 100% CPU problem I had disabled OpenMP in the compiler options.

Topic archived. No new replies allowed.