Remember that the power of threading lies in the fact that each separate thread has its own processor. However, a thread's processor will probably not be a separate *physical* processor because there will probably not be enough physical processors to go around for the number of threads you need to run simultaneously, and so therefore the OS emulates multiple processors using time allocations on the main processors.
In the code you have shown us, you have two threads - the main thread T1 and a delegated thread T2 which you want to do the donkey work. When you use a short sleep, T1 is not giving T2 enough time to receive the incoming data before T1 terminates T2 and reads the number of bytes that T2 has received.
The problem lies in the design of your program. At line 27, if done == false, then it is most likely that all the data will not have been received, and you terminate T2. If done is true, then the data has been received (hopefully) and you do not terminate T2.
I understand that the overall design of your program is meant only for learning purposes, but the design is poor. I think your design would be much neater if it used a semaphore class to start and stop the threads when required. Or, you may even be able to have T2 stop T1 while it receives the data, then start it again after.
Also, rather than interacting directly with any OS socket API, you will be much better off in the long run if you invest some time into learning how to use a good network programming library such as...
http://www.cs.wustl.edu/~schmidt/ACE.html
Learning how to use this library will allow you to concentrate on networking, and you can almost forget about all the frustrating idiosyncrasies of all the different OS socket API's. There are also many other classes including threads and semaphores.
Back to your problem. Bearing in mind that threads execute concurrently, consider the following scenario...
(1) T1 starts T2 and delegates T2 to receive 520 bytes.
(2) T1 continues on to the while loop, and T2 begins receiving data, done == false.
(3) T1 reaches the if-statement and reads the value of done, finds it false because T2 is still receiving, and terminates T2 before it is "done".
(4) So now T2 has received 520 minus n bytes, and T1 has reached the point where it reads the value of ret_length.
So if T1 waits long enough (with a longer sleep), then it will find out about all the bytes sent. But if T1 does not sleep for long enough, then it will only know about the number of bytes that T2 has actually received.
Dave