I have been searching for this everywhere, but I can't find any answer that brings any clarity. The fact that searching for "thread-safe" generally returns any forum page that includes the word "thread" (in other words, every forum page) doesn't help.
I have a C++ program which calls an external shell command. The program is multi-threaded, and the shell command can be started from multiple threads simultaneously.
The problem is, the program crashes at totally random places. This only happens when multi-threading is enabled and the external program is called from within the threads. It never crashes when multi-threading is disabled or when using the same multi-threading code for things that don't involve the shell command. I already redirected all the program's output to /dev/null and disabled any output to cout from within the threads to rule out any problems with those.
I initially used system() to call the shell command, but that proved to be not thread-safe. Then someone proposed an alternative using execl(), which I currently use. But judging from the random crashes, that's not thread-safe either.
So my question is: I want to call a shell command in multiple threads of the same C++ program, and I want to do this in a guaranteed thread-safe way. Ideally it should be thread-safe on as many platforms as possible, but currently I'm happy with Linux 2.6.27 or newer. How do I do this?
If the program is crashing when more than one instance of it is executed at once, I think there's two possibilities:
1. The program relies on some global resource, and parallel execution is creating race conditions between the instances.
2. Neither system() nor execl() are reentrant.
I don't have an alternative call or anything, but I do have a suggestion. If you can't manage to solve this problem any other way, perhaps you could wrap every call to system()/execl() with a lock-unlock of a mutex.
Unfortunately the whole point of multithreading my program was to be able to execute multiple instances of the extremely time-consuming shell command at once. The command processes data, and by chopping this data in multiple chunks and processing those in parallel on a multi-core machine, I get a huge speed boost. So, wrapping it in a mutex will be almost the same as running it single-threaded.
The strange thing is, the crashes seem to depend on the kernel version. The program has been running without problems for a year, and suddenly it started crashing after some system update. For instance, on 2.6.27.12-170.2.5.fc10.x86_64 it doesn't crash, while on 2.6.27.15-170.2.24.fc10.x86_64 it does.
There are some other functions like popen() which I could try, but I'm afraid they may rely on the same internal mechanism that causes the instabilty...