• Forum
  • Lounge
  • Long-running computation "hangs" on MacO

 
Long-running computation "hangs" on MacOS (Mac mini M2)

Pages: 12
So, I have a long-running computation that pretty much goes like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void main(void)
{
	std::deque<std::thread> thread;
	const size_t threadCount = std::min(MAX_THREAD_COUNT, std::thread::hardware_concurrency());

	for (size_t threadId = 0U; threadId < threadCount; ++threadId)
	{
		thread.emplace_back(thread_routine, threadId);
	}

	while (!completed)
	{
		std::this_thread::sleep_for(std::chrono::minutes(1));
		print_progress(); // <-- this will print a time-stamp and the progress in percent to the terminal
	}

	while (!thread.empty())
	{
		thread.front().join();
		thread.pop_front();
	}

	fputws(L"Finished.\n\n", stderr);
}


Thought our new Mac mini "M2" would be good device to let this run for a couple of days, because it is quiet and pretty efficient.

Only problem is that, after a while, the computation – or at least the progress update – seems to "hang" for some reason.

At the beginning, the progress update will be printed (roughly) once per minute, as is expected.

However, after about ~20 minutes or so, I noticed that it often won't print any progress update for more than 10 minutes!

In the "Activity Monitor" (task manager) I still see 100% load on all 10 CPU cores... 🤔

The very same code running on my Windows (built with MSVC) or Linux (built with G++) machine does not exhibit this behavior. Progress updates come in regularly once per minute, even if I let the program run for several hours.

So, any idea what is going on here? Does MacOS somehow "throttle" my long-running process for some reason?
Last edited on
Is completed a std::atomic<bool>?

Are you doing proper thread synchronization (using std::mutex or similar) inside print_progress() when accessing any data that is shared with the other threads?
Last edited on
To rule out that it's not just your terminal window that is blocking* you might want to redirect the output to file to see if it makes a difference.

* I remember having some issues like that on Linux a few years ago but then I was producing a lot of output, and the fact that the terminal window didn't have focus might have played a role, I'm not sure. I don't remember whether I could make it consume the output again by simply giving it focus or pressing enter or something...
Last edited on
Otherwise perhaps it would be useful to use some kind of debugger to see if and where it's stuck.
Last edited on
Actually, the "main" loop condition is while (g_counter < TOTAL), where the g_counter is an std::atomic_uint64_t.

The progress is simply computed as (g_counter / (double)TOTAL) * 100.0, about once per minute.

And the counter is the only state that is "shared" between the threads.

So, I really don't see how the "main" thread could be blocked for longer than sleep_for(std::chrono::minutes(1)) 🤔

There's just nothing in the code that is supposed to block, except for the sleep_for().

I understand that sleep_for(x) suspends the thread for at least a period of x. Once the thread becomes "ready" again, it may take a moment before the thread actually executes, because of how scheduling works. But a delay of more than 10 minutes seems weird...
Last edited on
Maybe the expression used to adjust the value of g_counter compiles to multiple atomic operations. For example
g_counter = g_counter + n;
might be problematic because it consists of a separate load and store. The solution would be to use std::atomic_fetch_add or an equivalent.

I know you're using MacOS, but Microsoft's standard library had a defect where sleep_for used the system clock and was vulnerable to adjustments in the system time.
Last edited on
kigar64551 wrote:
There's just nothing in the code that is supposed to block, except for the sleep_for().

The kind of blocking that I was referring to would be std::cout blocking if the output destination is too slow to read the output (for whatever reason). But since your program doesn't produce any other output the first thing you would notice would be that the terminal stopped reading/displaying the output. The blocking of std::cout in your program would happen later when the buffer is full (I doubt it would happen after only 20 lines of output) and you wouldn't notice that it happened unless you used a debugger.
Last edited on
Maybe the expression used to adjust the value of g_counter compiles to multiple atomic operations. For example
g_counter = g_counter + n;
might be problematic because it consists of a separate load and store. The solution would be to use std::atomic_fetch_add or an equivalent.

I think for my std::atomic_uint64_t the += operator should be equivalent to value.fetch_add(n).
Last edited on
The kind of blocking that I was referring to would be std::cout blocking if the output destination is too slow to read the output (for whatever reason). But since your program doesn't produce any other output the first thing you would notice would be that the terminal stopped reading/displaying the output. The blocking of std::cout in your program would happen later when the buffer is full (I doubt it would happen after only 20 lines of output) and you wouldn't notice that it happened unless you used a debugger.

I have experience that printing to the terminal can become a bottleneck if we write a lot of lines in short time. But, printing a single line, once per minute, shouldn't be an issue at all. Let alone, blocking for 10 minutes. Also, I don't think it can be a "buffering" issue, because after ~10 minutes I get the next output line, but the "missing" lines (that should have been printed in the meantime) never show up.

Anyway, I started another run today, with the stdout redirected to a logfile in one terminal window, and tail -f logfile.txt running in another terminal window. So far, I don't see any "gaps" in the printed time-stamps, so things looks good thus far...
Last edited on
Someone cheered too soon. Weird delays are still showing up:
https://i.imgur.com/diMMAeM.jpg
https://i.imgur.com/8POgEiF.jpg
Last edited on
In print_progress() (which is only executed by the main thread?) have a static variable which is incremented on each call and output this value as part of the display. Irrespective of the timings, this value should be sequential with no value missing. If you get missing values then there's a thread sync issue somewhere. If they are sequential with no missing then there's an os schedule issue. It could be that the threads are too busy working for this thread to be woken up at the of the sleep period.

In L4 of the original code, shouldn't it be:

 
const size_t threadCount = std::min(MAX_THREAD_COUNT, std::thread::hardware_concurrency() - 1);


as main() uses one thread. Is the value of hardware_concurrency() correct for your system?

In print_progress() (which is only executed by the main thread?) have a static variable which is incremented on each call and output this value as part of the display. Irrespective of the timings, this value should be sequential with no value missing. If you get missing values then there's a thread sync issue somewhere.

The main() has a local variable that is incremented by one on each progress update. Those values are sequential, no missing values.

It's only the time-stamps that seem to have "unexpected" gaps. I wouldn't worry if the delta between successive time-stamps would be varying by a few percent. But, I'm seeing delays that are 5 to 10 times bigger than the "usual" delay, every so often.

And yes, print_progress() is only executed by the "main" thread.

Only "communication" between the threads is that "worker" threads increment the global counter (std::atomic) every now and then.

In L4 of the original code, shouldn't it be:
as main() uses one thread. Is the value of hardware_concurrency() correct for your system?

I think the "main" thread spends 99.9% of its time sleeping (or, at least, it is supposed to). Only wakes up once per minute to print the status update and then goes back to sleeping. So, I don't think it is necessary/reasonable to "reserve" one core just for the "main" thread 🤔

std::thread::hardware_concurrency() returns 10 for the 10 cores machine (6 performance + 4 efficiency cores).

BTW: The one thing I was wondering about is whether we should bound the number of threads by the number of "performance" cores.
Last edited on
Depends upon scheduling. Yes, main() spends 99.9% of the time sleeping but if all the other threads are at 100% then waking up from a sleep may not be the most pressing thing for the os to schedule (it is a mac after all....) Although I agree that 10 minutes is long time.

Try reducing threadCount and see what happens....
What background tasks are running (eg virus scan etc)? Sometimes my Windows PC almost freezes up completely for a few minutes whilst background virus scans are taking place (the hard disk light stays on continuously during this period).

Is memory usage constant or is starting to page memory after a while and potentially thrash?
What background tasks are running (eg virus scan etc)?

Nothing, except for the "standard" MacOS background processes that you can't (easily) get rid of.

No virus scanner is installed.

the hard disk light stays on continuously during this period

The Mac mini has a built-in SSD, but TTBOMK there is no "activity" LED. I don't think HDD/SSD is a bottleneck here, though.

Is memory usage constant or is starting to page memory after a while and potentially thrash?

Machine has 16 GB of RAM. Currently 5 GB are used. Usage looks pretty constant.
Last edited on
With only 6 "worker" threads on the 10 cores machine it still is happening:
https://i.imgur.com/TmrtqNG.png

(in the "Activity Monitor" I can see that it is using only the "performance" cores now)

At this point I'm going to assume it is some weird MacOS quirk. Maybe related to "performance" vs. "efficiency" core scheduling?
Last edited on
Now, this is getting kind of ridiculous 😆

sleep_for(std::chrono::minutes(1)) suspends the thread for more than 30 minutes:
https://i.imgur.com/zgM42TU.png
Last edited on
Is there a mac os API to sleep?
Random StackOverflow thread that might be related:
https://stackoverflow.com/questions/49677034/thread-sleeps-too-long-on-os-x-when-in-background

Apparently, "App Nap" was culprit for that person.
Last edited on
Is there a mac os API to sleep?

After all MacOS is based on Unix/BSD, so it has POSIX API (<unistd.h>) header with usleep() function:
https://pubs.opengroup.org/onlinepubs/009696899/functions/usleep.html

Yeah, I can give it a try!
Last edited on
Pages: 12