[SOLVED] Thread vs. non-threads in a DLL

Dec 6, 2008 at 2:28pm
A big hi to these forums from a new member here. :) I hope anyone can answer my questions.

I'm currently writing a DLL for networking purposes. The idea behind the library is that the exported functions provide the calling application with an easy-to-use interface to collect information about connected peers, to pull received data from a queue and to send messages to peers in the network. Some low-level and higher-level things should however be taken care of by the DLL in the background, without intervention from the calling application.

So I gathered that I should start a single thread that continuously polls the network interface (I'm using a nifty UDP sockets implementation I found on the net) in a continuous loop and processes incoming messages. System-level messages can then be taken care of by the library independently, while messages for the calling application are queued in a few globally declared data structures. The calling application can thus pull information from these queues at their own time. All exports for the caller should be asynchronous (blocking) - the caller requires an answer before continuing.

I'm relatively new to thread programming, and read a number of articles on the subject, but I haven't quite found any that answer my questions.

First of all, both the polling thread and the exported functions need critical sections when accessing the global data. I am however unsure whether the exported functions, which are not at all threaded routines, can define critical sections? I don't want to define them as threads. I just need a way to make sure they're only setting, looking at or clearing the data structures when the polling thread isn't. None of the exported functions would be called concurrently, as the calling application has a single thread itself. The critical section of an exported function would in general start before doing anything, and end where it returns data to the caller.

Second, I understand that starting and ending critical sections come at a performance penalty. Is it therefore any use to define a number of critical sections to protect only specific sections of code, or is it better to use only a single section which may also include operations on strictly local data sets?

Third, does anyone have any suggestions for a threading implementation, preferably one that is cross-platform? I'm relatively happy with anything that is above all fast and that simply allows me to define critical sections where I see them fit. The one in Boost seems to be one of the more popular, but I'd appreciate hearing some experiences.

Thanks in advance for any suggestions or recommendations!
Last edited on Dec 15, 2008 at 12:25am
Dec 6, 2008 at 3:48pm
A critical section is just a piece of code protected by a lock to prevent multiple threads/processes from executing it simultaneously. Any piece of code can be a critical section.

The performance penalty of critical sections results from two things:
1) The process has to grab and release a lock. This is work that the process wouldn't otherwise have to do.
2) While one process owns the lock, no others may execute the critical section. This can slow things down if many processes get blocked continuously waiting for the lock.

At least when doing OS development, typically critical sections are limited in size to prevent contention. This is at the cost of adding more critical sections which makes verification of the program harder.

I'd use the boost threads library.

Dec 7, 2008 at 7:52pm
Thank you for your answer, I'm only running one thread, I just need to make sure no function calls interfere with it. Good to know it doesn't require the other processes to be threaded.

From what you describe, it seems to me that a lock is nothing more than a special kind of variable. That should make the overhead relatively small. But if my communication monitoring thread would run like this (pseudo code):

1
2
3
4
5
6
7
8
9
10
11
while (servicing) {

  critical_section_start();

  if queue_has_messages() {
      process_messages();
  }

  critical_section_end();

}


I'm wondering if this would be overkill in querying the network queues. Checking for queued messages would be relatively a fast method, so most of the time the thread would be busy obtaining and releasing the lock and checking for messages rather than processing any. I'm unsure how time is allotted to other threads other than setting by a priority, but it doesn't seem fair.

Is there a way to pause a thread momentarily? I've looked around but I can only find information such as flagging it to stop and start from another process, while I want to pause it with a timer - for example, I would use a sleep function in a non-critical section. I've read somewhere that sleep() isn't quite safe for threading, is this correct? What other methods are there?

Another question - if the servicing flag in the example is a boolean, would it be atomic enough to be flagged from another process without causing race conditions (it's in a non-critical section)? It would be an elegant way to stop the thread at the end of the loop.

Again, thanks in advance for any answers.
Dec 7, 2008 at 8:10pm
Well, you hit it on the head -- this is an infinite busy loop that will not only consume all CPU, but do so with the lock held almost 100% of the time. You need to rethink this design.

Your thread should only wake up if there are messages in the queue. Consider using a counting semaphore for the number of messages in the queue, then your processor thread will block as long as the queue is empty and consume no CPU.

How threads are handled is an OS-dependent question. Linux pthreads, for example, are implemented as lightweight processes that are scheduled in the same way as "heavyweight processes", which is to say they are given a timeslice which by default is on the order of 100-150 milliseconds.

You should declare servicing as a sig_atomic_t type. This type is guaranteed to be atomic whereas using a bool probably is, but isn't guaranteed. At an absolute minimum you would need to declare the boolean as volatile to avoid potential compiler optimizations that might break your code.

What OS are you using? I can speak only to Un*x platforms, not Windows. You are correct in saying that sleep() isn't safe for multi-threaded apps. Typically the threads package you are using will provide functions to sleep, etc.
Dec 7, 2008 at 10:48pm
Thanks again for your feedback!

You are correct, I would really require the thread to only wake up when it needs to. Ideally, the network implementation layer would have a callback triggered by the hardware or the underlying protocol to fire off a routine when data is received. I've checked sockets API documentation if anything like that exists but no such luck - I need to poll it myself (through the UDP library I'm using) continuously. That's why I (thought I would) need a thread. In this light, I'm unsure how a counting semaphore could help me - if I understand them correctly a semaphore would still need to be increased by some process that polls the queue. The UDP library I use doesn't apply threads.

Judging by the differences between platforms that you point out I may have a challenge here. I didn't know there were atomic types, that's certainly going to be useful.

The (cross-platform) Boost library I've been looking at doesn't seem to have any sleep functionality for threads. There are functions to start, stop and pause threads, but they would be triggered from other processes.

I'm using Windows, but the library might at some later stage be compiled for Mac OS or Linux. I try to plan ahead, so preferably I'd use a platform independent solution, but if need be I'll just have to make platform-specific implementations.

In Windows I have previously used, of all things, some Windows multimedia routine in the OS's API which happened to define timed callbacks. They were reasonably accurate and they allowed me to intermittently fire a process with an alarm time defined in milliseconds. But I doubt they were intended for the purpose I used them for, and I wonder if there is a better plan.
Dec 7, 2008 at 11:37pm
You can poll quite happily. When you check and there is no message. Just tell the thread to Sleep(0). The 0 basically tells the scheduler that your thread is done and it gives up the rest of it's timeslice. This will not put much overhead on your CPU during the polling process.

I have used this technique for thread polling network connections previously. And my network connection had to handle 100million file transfers.

Sleep(x) is a windows function, not part of the threading library.
Last edited on Dec 7, 2008 at 11:38pm
Dec 8, 2008 at 11:02am
Thank you. But I'm tentative to use sleep(0), here is one argument against it:

http://blogs.msdn.com/oldnewthing/archive/2005/10/04/476847.aspx

I fear I haven't explained my case properly enough, so I'll try and elaborate a bit.

The caller application and the DLL would in total only have two 'threads': the caller's main application process (there are no other threads in the main app) and the polling thread in the DLL, which sends/receives messages around the peers while the main application minds its own business. The caller pulls data from the DLL asynchronously, waiting for the function calls to return. So process yielding only goes on between the application thread and the polling thread (and other threads in the OS, of course).

So since there isn't much multithreading going on, would using sleep(x) for a specific amount of time be a suitable alternative after all? At least it would give me an easy-to-tune the frequency for the polling. I know sleep(x) has a system-dependent resolution, but I'm looking at maybe 100 to 200 calls per second max - and once called, it would poll the queue until there is no message left anyway.

Note that I realise sleep(x) isn't cross-platform.
Last edited on Dec 8, 2008 at 11:03am
Dec 8, 2008 at 2:14pm
The resolution you are looking for (5-10ms) may not be provided. For example, Un*x sleep() has a granularity of 1 second. usleep() has a granularity of 1 ms, BUT, there are cases where the time unit is small enough that the kernel busy loops, which is not what you want.
Dec 9, 2008 at 6:14pm
Yes, I just realised that. The resolution is apparently hardware related, so I do not have that much of a choice anyway. In fact, the multimedia timer in Windows I have previously used has a higher resolution than the usual Settimer provided in the Windows OS, so it looks as if I need to use them once again. For safetly, I should probably settle for around 60 calls per second, a resolution of around 15 ms.

Which means I'm not going to use threads after all, but I will have to define critical sections somehow, both in the polling thread as in the exports. In other words, all exports need to be wrapped in a critical section:

1
2
3
4
5
6
7
8
9
10
11
export type myexport(var,var,var) {
type localvar;

start_critical_section();

perform_actions_on_shared_resources();
localvar=some_shared_resource;

end_critical_section();

return (localvar);


That will give me something to do. :)
Dec 10, 2008 at 6:21am
i don't know how the rest of your program is set up, but i may have an idea that will work (for windows at least, i can't speak for other platforms)

what if you wrapped the information you are sending between the dll and main app into a struct, then, when the polling thread receives new information, it makes a new struct filled with that info, then adds it's pointer to a queue. when the main app checks the queue, the function doing the checking could then just return a pointer to the struct then signal the polling thread to pop the first pointer in the queue. the easiest way in windows without using critical sections is either a spin lock, or (in my opinion better) using the 'Interlocked' functions (you can read about them here: http://msdn.microsoft.com/en-us/library/ms686360(VS.85).aspx ).
so, what i mean by signal, is use an interlocked function to block other threads to set a variable accessed by the polling thread which indicates that it should pop the first pointer. note that the function returning the pointer to the main app should wait until this variable is reset by the polling thread (after the first pointer has been popped) to actually return a pointer, otherwise you could end up with resend of the same pointer. if you think this idea will work, just don't forget that you need to free the memory in the main application since the dll doesn't know what is going on with that memory pointed to by the returned pointers.
Dec 11, 2008 at 2:10pm
That's a very interesting solution. But I should add that I don't have any control over the calling application and that its limited exports do not support structs, let alone exchange using shared data - I can only pass and retrieve null-terminated strings and doubles. However I can see how it would work also when the struct is entirely kept in the DLL.

Unfortunately I will not be able to use it for other reasons. First of all, the struct can become rather large (especially for the server end) and I fear that this copying could result in a performance drop. Second, the variables shared by the exports and the polling thread are mostly consisting of dynamic arrays such as queues and maps. When an exported function e.g. pulls information from a queue, it will shrink, which is good, but they will not always have finished the queue before the polling thread is ready to replace the existing struct with the new one - resulting in the same values to be pulled again. Correct me if I am wrong!

Ideally, I would make the polling thread only "wait until any of the existing exports that are currently executed have finished before entering the critical section". I know I can do this by defining critical sections in all exports, but it seems rather redundant. Waiting would be rather short because the exported functions mostly only exchange some type casted data and a execute few simple function calls. A thing that may help is that all exported functions are always called asynchronously - there will at most only be one executed at any given moment.

Anyway, thanks for sharing your thoughts, I will definitely keep the idea in mind for other projects.
Last edited on Dec 11, 2008 at 3:35pm
Dec 12, 2008 at 4:51pm
On a related note, would it be silly to handle the critical section by my own custom implementation using a semaphore and a spin lock?

For example, there is a single atomic variable used as a semaphore. The exports would never wait - they would just set the semaphore upon entering their critical section and release it when done. They're very short calls after all, and they're never concurrent.

The polling thread would check the value of the semaphore and spin until it is released, then enter the only section in the DLL that is defined critical. I assume that a critical section means that no other thread or process anywhere is alotted any time, including the main thread in the calling application. It wouldn't even need to touch the semaphore.

I wonder if it is useful or wise to implement it in this way, as it seems to me a very fast custom implementation - but I'm sure someone can fire it down...

Edit: Never mind - I forgot I need to give up the polling thread's time share, and the only way it's going to do that is by sleeping. :(
Last edited on Dec 12, 2008 at 4:59pm
Dec 12, 2008 at 7:28pm
critical sections are generally only for blocking threads in a single process, there are other methods that span multiple processes. a spin lock may be a good idea though, i've used simple spin locks with success before, just using a 'long' variable, and windows 'interlocked' functions. in fact, you can pretty much implement a simple spin lock in three lines of code.

edit: here is one possibility for a spin lock:
1
2
3
long SpinLock = 0;
#define  ENTER_LOCK_   while( InterlockedExchange( &SpinLock, 1 ) == 1 ){}
#define  EXIT_LOCK_    InterlockedExchange( &SpinLock, 0 ); 


given the new info, I may have a workable solution provided that you don't mind your polling thread to be inactive for short amounts of time.
but, who is cleaning up the memory allocated by the polling thread for the new data it is receiving? you said you don't have any control over the main app, so how then, do you know when it is done with the data and can delete it?

anyway, create a simple spin lock, and create two variables ( they will be used for signaling ). now, when the exported function is called, it would set the first signal variable, then spin until the second is set. Now, in your polling thread (either before or after what takes place in the loop but in the loop) you check the first signal variable, if it is set, then also set the second signal variable and spin until the first is reset ( thus allowing the function to retrieve the data it needs ). then, at the end of the function, reset the second, then first signal, and return, and your polling thread can continue on it's way. The way that i described, because you are only checking the signal once each time through the polling thread's loop, even if the exported function is called very very frequently, the polling thread would still execute it's loop once through all the way before allowing the function to run. and that way, even if the function is being called very frequently, your polling thread should also be looping frequently.

NOTE: 1) make sure all of those signal accesses are protected by either a spin lock, or possibly just use 'interlocked' functions. 2) This is what i think will work in theory, since i am not actually writing the code and running it, I can't attest to it's functionality.
Last edited on Dec 12, 2008 at 7:32pm
Dec 13, 2008 at 12:31am
Thank you for your elaborate reply!

I guess I'd better start from the beginning.

My goal is to provide a networking library for the calling application. The calling application has a scripting language that allows you to bind external function calls, provided that they listen to strict formatting rules: the exported functions have a limited number of parameters, and the parameters themselves are either doubles or null-terminated strings. The return type, too, can only be one of each. The calling application's script language cannot handle pointers (even though strictly speaking the strings are pointers), so I have no direct memory access. I can't even define struct-like data types.

The plan is to provide the scripting language of the main app with some new functionality using a DLL. The exports give the main app a series of functions to start a networked session, check the connection status, check which peers are connected, send messages to one or more peers and read received messages, pulling them one by one from the queues filled by the polling thread.

Especially for such an entity as the server, I need a polling thread - it needs to relay messages to peers transparently without interaction from the scripting language, mostly because a library is much more efficient at this than the scripting language.

So, the thread is going to manage a number of maps, queues and other variables in the background. This data is all kept in the DLL's private memory, which the exports can read from. Using these exports, the calling application will be able to read, set or pull information from these variables and dynamic arrays. Most of the exported functions will actually make a small copy (obviously of a double or string, not of an array), then do some type casting so that the calling app receives data in its preferred format. Some exports may however shrink or clear queues when popping data from them.

Since the calling app has only one (main) thread, I know none of the exports will ever be called concurrently. But the background thread (which is started by the main app through one of the exports) can interfere with their access to the shared data. That's where I need to define some critical sections.

When it comes to your suggestion, I have two questions:

1. I'm not sure why I would need two sync variables? I would think that one atomic variable would do the trick - if it's flagged, the polling thread will wait until it is cleared before entering its critical section (I could just use one variable with a value indicating which of the two "owns" the critical section?)
2. How long would the spin lock spin before the OS decides its time share has elapsed? I would think it will just spin away for a while before finally giving the control back to another thread, such as the currently called exported function. Wouldn't it be better to concede that "I'm obviously waiting for something else to finish first, so let's return to the OS and give that thread a chance"?

Again, thanks for your efforts, I really appreciate all the feedback I've been given here.
Dec 13, 2008 at 2:01pm
the deal with the two sync variables, is yes, when you set the first, the polling thread will wait to enter it's critical section... unless it's already in it's critical section, in which case there will still be a little bit of time before it spins where it can still access shared memory. In this time, the exported function, if it didn't wait for the second variable, would be able to access the shared memory concurrently to the polling thread... not exactly what you want! so that's why the two variables.

I don't know if it's technically correct, or could cause any problems, but I just got this nifty idea to use as a spin lock:
1
2
3
long SpinLock = 0;
#define  ENTER_LOCK_   while( InterlockedExchange( &SpinLock, 1 ) == 1 ){ SwitchToThread(); }
#define  EXIT_LOCK_    InterlockedExchange( &SpinLock, 0 );  

The SwitchToThread() function yields execution of the calling thread and has the OS check to see if there are any other threads it can run, if there aren't, it just continues on in the current thread. so... your spin lock checks to see if it has to spin, and if so, it instead yields execution to the OS and lets other threads run.
Dec 15, 2008 at 12:24am
Ah, I see now - I thought a critical section meant that no other thread whatsoever would be allowed to run until it's finished, as if time sharing would be temporarily disabled. I guess this just isn't the case.

I'll probably first try the Windows critical sections, but I'll want to test the custom spinlocks with SwitchToThread() as well - might just gain some performance there.

I'll mark this thread as solved. Thank you all for your suggestions and feedback!
Topic archived. No new replies allowed.