std::vector<type> slow to write to memory

I have a thread (which I will call the background thread) that scans a disk and loads its file names into a std::vector, inside a class called DiskClass. When it finds a new file it pushes the file details onto the vector and then increments a counter.

Meanwhile I have another thread (which I will call the worker thread) querying DiskClass, looking to see if there any new files, and processing any new files. It knows if there are new files when DiskClass counter increments.

The intention of this is to allow the DiskClass to populate in the background while the foreground task processes the files.

Anyway, sometimes (as far as I can tell) the worker thread catches up to the background thread, so it's reading the vector entry that was just milliseconds before written by the background thread. And sometimes, one time in a few thousand, it reads the latest vector entry as blank, EVEN THOUGH THE VECTOR IS PUSHED_BACK *BEFORE* THE COUNTER IS INCREMENTED in the background thread.

It never makes the same mistake twice, suggesting to me there is some sort of a race condition happening.

As far as I can figure it, the counter, a long integer, writes to memory faster than the vector (a std::vector of complex struct of std::strings), so that when the worker thread sees the integer increment and then goes to read the vector, the vector is still being written by the background thread, and thus I get erroneous results.

Does that sound right?

And short of waiting for the background thread to finish before I even start the worker thread (my current quick workaround, which 100% solves the erroneous results) and bearing in mind I don't want to merge the two threads into one synchronous process unless I fundamentally have to, what can I do to ? Is there some sort of way to force vectors to be written to memory synchronously, so I can be sure the integer will never be stored first?

I've even noticed that the vector.size() increments faster than the actual payload. I imagine it's because the size is stored on the stack, just like my counter, and the payload itself is stored on the heap.

sorry for the long winded question, and for my lack of knowledge of threads.

thanks in advance

John


The easiest solution I can think of is to provide a mutex for the vector, which you should be doing anyway. As in, when adding to the vector, lock the mutex, add to the vector, increment the variable, and then unlock your mutex.

Then, your 'worker thread' will have to wait for the writing operation finish on the vector by trying to obtain the mutex before reading the value (and releasing the mutex once the stored data is read). That way you make sure that the operations have completed for the vector before you keep going.

Also, just as a little thing, it was otherwise possible that unless you had some sort of fence between the push_back statement and the increment, the compiler is well within its rights to swap the order of the statements, causing an increment before it does the push_back statement. As well as that, reading the counter can be dangerous if you read it at the same time it is being written to (you could get a garbage value), doubly necessitating the need for a mutex.
Oh yes, a mutex. I used them everywhere else, I really should have thought of using them there. Simple! Thanks.
A mutex in conjunction with a std::condition_variable
http://en.cppreference.com/w/cpp/thread/condition_variable
I'm wondering why you're using a vector for this?

As the worker thread processes files, are you deleting those entries from the vector? If not, the size of your vector could get unwieldy. If you are deleting, you may be spending a lot of time copying the remainder of the vector to delete the first entry (assuming the worker thread is processing from the front). Likewise, you could be spending a lot of type reallocating and copying the vector if the background task is adding a significant number of files (presumably at the end). All those operations need to be protected by mutex as NT3 pointed out.

To me, a std::deque or std::list woiuld be a more logical choice for this. Fast access for removing from the front and fast access to adding at the end.
Last edited on
> I'm wondering why you're using a vector for this?
> ...
> To me, a std::deque or std::list woiuld be a more logical choice for this.

"When it finds a new file it pushes the file details onto the vector and then increments a counter" points towards a ring-buffer implementation. A vector providing the underlying storage for the ring-buffer would be more efficient than either a deque or a list.


> All those operations need to be protected by mutex as NT3 pointed out.

A mutex alone would not avoid race conditions (without great loss of efficiency) in a producer-consumer scenario; a std::condition_variable in combination with a std::mutex is canonical.


Topic archived. No new replies allowed.