Deadlock

Hi all,

the following snippet of code ends up occasionally in a deadlock, can anyone figure it out why?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mutex mutex_;
condition_variable condvar;
bool tM_continue = false; // main thread

thread t1 { [&](){
    f(x); // sequential operation...

    { // sync, restrict the scope
        tM_continue = true;
        condvar.notify_all();
        unique_lock<mutex> xlock(mutex_); // <!-- t1 blocked here
        ...
    }

}};

{ // sync
    unique_lock<mutex> xlock(mutex_);
    condvar.wait(xlock, [&](){ return tM_continue; }); // <-- main blocked here
    tM_continue = false;
}

...


Linux x86/64, GCC 10.x
"Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread."
https://en.cppreference.com/w/cpp/thread/condition_variable

Probably this is what is required:

1
2
3
4
5
6
7
8
9
10
11
12
13
thread t1 { [&](){
    f(x); // sequential operation...

    { // sync, restrict the scope

        unique_lock<mutex> xlock(mutex_); 
        tM_continue = true; // modified under the mutex
    }
    
       condvar.notify_all(); // "(the lock does not need to be held for notification)"
        ...

}
Hi there,

thanks for the quick reply!
The rule is there and I cannot challenge it. I wonder what is the reason behind though, as I tend to write my own locks [...].

The only way I can think of a scenario leading to a deadlock is only if any of these statements have been re-ordered:
1
2
3
        tM_continue = true;
        condvar.notify_all();
        unique_lock<mutex> xlock(mutex_);


Intel x86-64 cannot reorder these instructions, because, well they must involve some kind of stores [1]. Which brings to the dock the compiler, even when the source code is built with -O0. So, if a compiler barrier were to be added between the three statements, then the code produced should be deadlock free & correct?

[1] Intel 64, Software Dev. Manual, Volume 3, Section 8.2.3.2
This is my understanding (I'm not an expert on locks, so it may not be completely accurate):

With modification without the mutex, the second waiting thread may not have reached the wait state on the condition variable (it may be just before it) when the first thread issues the notify call. Condition variables do not remember events, and the notification may be lost.

With modification under the mutex, when there is a receiver thread, the notification would only be sent when the condition variable (the receiver thread) is in a waiting state.
Hi, thanks for the clarification! I now understand the issue.
Topic archived. No new replies allowed.