lock-free queue usage

I've listened to Herb Stutter's youtube talks on lock-free data structures and I wanted to try and use a lock-free queue. I decided to use MPMCQueue.h available on github, having found numerous examples.

The reason I chose MPMCQueue.h is it looked simple enough, only having 180 lines of code.

The idea with a multi producer multi consumer lock-free queue is that multiple threads can push and pop to and from the queue without having a lock and optionally a condition variable (if a unique_lock).

The question I have is how to have multiple threads consume without spinning, Before using MPMCQueue.h I was using std::queue and using a unique_lock with a condition variable.
The producer would then wake the condition with a notify(). The benefit of a condition variable is that it sleeps while waiting, thus minimising CPU cycle usage.

Currently there are 2 threads, each in a while (true) loop. Inside the loop the unique lock with an associated condition variable verifies if the queue is indeed not empty before popping.
Something like this:

1
2
3
4
5
6
7
8
9
10
11
12
    while(true)
    {
        {
            std::unique_lock<std::mutex> lock(mutex_);
            item _item;
            cv_queue_.wait(lock, [this,&_item]{return requests_.try_pop(_item);});
        }
       /*
             do some work
       */

    }


So I am unsure how to have threads without a lock, on a lock-free queue.
I am confused on how to go about using a lock-free queue , which I assumed would reduce workload.
Last edited on
I'm almost certain that the code you posted is wrong. The point of the lock is that the all threads accessing the resource must use the same lock variable. That variable is what synchronizes the threads.
> So I am unsure how to have threads without a lock, on a lock-free queue.

See this example from boost (using boost::lockfree::queue).
http://www.boost.org/doc/libs/1_64_0/doc/html/lockfree/examples.html

Note that there is no lock.
Non-blocking data structures do not rely on locks and mutexes to ensure thread-safety. ...
Instead of relying on guards, non-blocking data structures require atomic operations (specific CPU instructions executed without interruption). This means that any thread either sees the state before or after the operation, but no intermediate state can be observed. ...
http://www.boost.org/doc/libs/1_64_0/doc/html/lockfree.html



> lock-free queue , which I assumed would reduce workload.

It would typically increase the overall workload (spinlocks can waste a lot of processor cycles).
When discussing the performance of non-blocking data structures, one has to distinguish between amortized and worst-case costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an application one should consider high-performance concurrent data structures.

Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion, which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a specific workload.
http://www.boost.org/doc/libs/1_64_0/doc/html/lockfree.html

Thanks for providing the links. I have not yet done any measurements as I am still coding. I am not in a position to assess amortized and worst-case throughput as yet, but I agree with you that I was jumping the gun in assuming reductions in workload.

In the lock-free queue boost example, the consumer includes 2 while loops,
1
2
3
4
5
6
7
8
    int value;
    while (!done) {
        while (queue.pop(value))
            ++consumer_count;
    }

    while (queue.pop(value))
        ++consumer_count;


I am unsure why 2 are required?
Last edited on
done is set to true once all the producer threads are finished.

The consumer threads would then come out of the first while loop; at that point in time the queue need not be empty (they may not have finished consuming all the elements that were pushed into the queue).

The second while loop is there to make sure that the residual elements that remained in the queue when done was set to true are consumed.
bluefrog wrote:
I decided to use MPMCQueue.h available on github

which one? Facebook's https://github.com/facebook/folly/blob/master/folly/MPMCQueue.h ?

bluefrog wrote:
lock-free queue , which I assumed would reduce workload.
JLBorges wrote:
It would typically increase the overall workload


Fedor Pikus had a great line at his CppCon 2015 talk "incorrect lock-free algorithms have better performance"

bluefrog wrote:
I have is how to have multiple threads consume without spinning

Just consume (if that is the facebook queue, call read(T&) and go do something else if it returned false. If your consumer has the need to spin, it's not lock-free anyway.
Last edited on
@Cubbi,

But if a queue (lock-free or not) is empty, then you have to wait, and waiting involves some resource usage.

So the question was really about how to minimise the impact of waiting.

For example, I have a TCP/IP server that accepts client connections sending SQL statements, the service layer within the TCP/IP server pushes the msg/socket (struct) onto a queue so that another layer within the application can deal with the database(s).

A consumer then pops as and when the queue is non-empty. Invariably, there are busy and quiet periods of activity. The consumer(s) thread(s) cannot be joined, since they must always wait.
All the examples I have seen, including the Boost lock-free q example, either present a two stage process or join the threads in order to complete their work, but the consumer threads in this case cannot be ended, unless the entire server is shut down.
But if a queue (lock-free or not) is empty, then you have to wait

Not if you're consuming those data in a lock-free algorithm. The point of a lock-free queue is that it has a non-blocking "try pop" interface.

So the question was really about how to minimise the impact of waiting.

That's a different story. Your basic design dimensions are: sync waiting vs async waiting, and if sync, user-space vs kernel-space (spinlock vs mutex/convar), or some blend inbetween.
Topic archived. No new replies allowed.