Voice Chat - Low Latency Peer to Peer

I am trying to make a voice chat application.
As i don´t know much of programming, i use a sample code i found.

The sample code though has 2 problems.

1, it has high latency (i think it´s cause it´s using Directsound, i would like to use Wasapi or Xaudio2 as i think those have low latency).
But then again, as i said, i am not that familiar with code.

2, it stops recieve/play audio when window is not in focus.
I have no idea why, but i am guessing it´s very simply.

http://www.sendspace.com/file/vnrt22

There is the code.

I have been searching and trying my best to find how to change to Wasapi, but i don´t know how, which really frustrate me.
1, it has high latency (i think it´s cause it´s using Directsound

DirectSound can have extremely low latency. I've streamed DS buffers that were ~15 to 20 ms latency before. And I'm sure you can get it even lower if you push harder. So I very much doubt that is the problem.

Of course you're not going to get such a low latency with a P2P client because it takes a while for the data to travel over the network.

2, it stops recieve/play audio when window is not in focus.


THAT is because of DirectSound. Specifically, it sounds like you have the cooperative level set to "Exclusive" mode. Try setting it to "normal" mode instead:

http://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.idirectsound8.idirectsound8.setcooperativelevel%28v=vs.85%29.aspx
Last edited on
Oh, well that it´s probably how it works.
I don´t understand much of the code, i have modified it a bit but that´s pretty much it.
The one i want to connect to have very low ping to me, and me to him of course.
3-4ms to be exact.

So the latency on the network level, is lower then the rest, which is why i want to keep it at a minimum.

THAT is because of DirectSound. Specifically, it sounds like you have the cooperative level set to "Exclusive" mode. Try setting it to "normal" mode instead:


The code is, C#, forgot to meantion that.
I am trying to convert it to c++, but well, i don´t know how to implement Directsound to it.

But is c# going to limit anything in this type of application?


And to the code you were talking about.

device = new Device();
device.SetCooperativeLevel(this, CooperativeLevel.Priority);

That is the only cooperative level i have in the code.
I tried changing it, and it doesn´t make any change.

But fro what i see, it isn´t the type you are talking about.

I will try to search for information on how to add it.

Thanks
I actually solve the focus thing:)

playbackBufferDescription.GlobalFocus = true;

I needed to set this.
So now that´s out of the way.


Now for the thing to lower the latency, which i sadly have no idea off, i am guessing i need to tweak the buffers, but sadly i don´t know how.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
device = new Device();
                device.SetCooperativeLevel(this, CooperativeLevel.Normal);

                CaptureDevicesCollection captureDeviceCollection = new CaptureDevicesCollection();
                
                DeviceInformation deviceInfo = captureDeviceCollection[0];
                
                capture = new Capture(deviceInfo.DriverGuid);

                short channels = 2; //Stereo.
                short bitsPerSample = 16; //16Bit, alternatively use 8Bits.
                int samplesPerSecond = 48000; //11KHz use 11025 , 22KHz use 22050, 44KHz use 44100 etc.

                //Set up the wave format to be captured.
                waveFormat = new WaveFormat();
                waveFormat.Channels = channels;
                waveFormat.FormatTag = WaveFormatTag.Pcm;
                waveFormat.SamplesPerSecond = samplesPerSecond;
                waveFormat.BitsPerSample = bitsPerSample;
                waveFormat.BlockAlign = (short)(channels * (bitsPerSample / (short)8));
                waveFormat.AverageBytesPerSecond = waveFormat.BlockAlign * samplesPerSecond;

                captureBufferDescription = new CaptureBufferDescription();
                captureBufferDescription.BufferBytes = samplesPerSecond*channels;
                captureBufferDescription.Format = waveFormat;

                playbackBufferDescription = new BufferDescription();
                playbackBufferDescription.BufferBytes = samplesPerSecond*channels;
                playbackBufferDescription.Format = waveFormat;
                playbackBufferDescription.GlobalFocus = true;
                playbackBuffer = new SecondaryBuffer(playbackBufferDescription, device);

                bufferSize = captureBufferDescription.BufferBytes;


Here is what i am guessing is where it creates the buffers, and what to store (Microphone, samplerate etc).

I have set it to Stereo and 48khz and 16 bit, which is what i want.



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 //The following lines get audio from microphone and then send them 
                //across network.

                captureBuffer = new CaptureBuffer(captureBufferDescription, capture);                

                CreateNotifyPositions();

               int halfBuffer = bufferSize / 2;

                captureBuffer.Start(true);

                bool readFirstBufferPart = true;
                int offset = 0;

                MemoryStream memStream = new MemoryStream(halfBuffer);
                bStop = false;
                while (!bStop)
                {
                   autoResetEvent.WaitOne();
                    memStream.Seek(0, SeekOrigin.Begin);
                    captureBuffer.Read(offset, memStream, halfBuffer, LockFlag.None);
                    readFirstBufferPart = !readFirstBufferPart;
                    offset = readFirstBufferPart ? 0 : halfBuffer;

                    //TODO: Fix this ugly way of initializing differently.

                    //Choose the vocoder. And then send the data to other party at port 1550.

                   
                        byte[] dataToWrite = memStream.GetBuffer();
                        udpClient.Send(dataToWrite, dataToWrite.Length, otherPartyIP.Address.ToString(), 1550);


I am guessing something can be done here, i tried playing around, with no success though.

playbackBufferDescription.BufferBytes = samplesPerSecond*channels;

The size of the buffer determines the latency. If you have a buffer that holds 1 second worth of audio, you will have maximum 1 second latency (ie: it will take 1 second from the time the audio is put in the buffer until it is actually heard).

It looks like you are giving it 0.5 seconds here. Samplerate * Channels * Bytes_per_sample would be 1 second... since you have 2 bytes per sample, samplerate * channels would be 0.5 seconds.

So for a lower latency... reduce the buffer size.

Note that lower buffer sizes means more risk of underrun (the buffer runs out of audio faster, so you have to be quicker about keeping the buffer full).
IT seems that lowering playbackbuffer and capture lowered the latency as you said.

But i do get an underun, or "flicker" in the audio.

Is there a way to solve this while keeping the latency?
Is there a way to solve this while keeping the latency?


Be quicker about putting the audio in the buffer. That's really all there is to it.

Exactly how that can be accomplished depends on how you're filling the buffer now. It likely will involve sending smaller "chunks" of audio to the buffer at a time... so you are writing lots of little chunks instead of a few large chunks.
Okay, well i don´t know how that is done right now.

But isn´t this it?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//The following lines get audio from microphone and then send them 
                //across network.

                captureBuffer = new CaptureBuffer(captureBufferDescription, capture);                

                CreateNotifyPositions();

               int halfBuffer = bufferSize / 2;

                captureBuffer.Start(true);

                bool readFirstBufferPart = true;
                int offset = 0;

                MemoryStream memStream = new MemoryStream(halfBuffer);
                bStop = false;
                while (!bStop)
                {
                   autoResetEvent.WaitOne();
                    memStream.Seek(0, SeekOrigin.Begin);
                    captureBuffer.Read(offset, memStream, halfBuffer, LockFlag.None);
                    readFirstBufferPart = !readFirstBufferPart;
                    offset = readFirstBufferPart ? 0 : halfBuffer;
                  
                        byte[] dataToWrite = memStream.GetBuffer();
                        udpClient.Send(dataToWrite, dataToWrite.Length, otherPartyIP.Address.ToString(), 1550);

                        byte[] dataToWrite = memStream.GetBuffer();
                        udpClient.Send(dataToWrite, dataToWrite.Length, otherPartyIP.Address.ToString(), 1550);


Here is the Receive:

1
2
3
4
5
6
7
8
                    //Receive data.
                    byte[] byteData = udpClient.Receive(ref remoteEP);

               //Play the data received to the user.
                    playbackBuffer = new SecondaryBuffer(playbackBufferDescription, device);
                  
                    playbackBuffer.Write(0, byteData, LockFlag.None);
                    playbackBuffer.Play(0, BufferPlayFlags.Default);


Last edited on
Okay, i changed my approach, i will try to use NAudio, it seems alot easier, and i can get low latency easier.

Though, i don´t know how to write recorded data to byte to send through UDP.
Topic archived. No new replies allowed.