Vocal keys

So I have written a program that checks for the frequency of signed 16bit raw audio by counting the number of times the data crosses the x axis. It's pretty accurate for computer generated sine waves, but that type of data is free of static and multiplier sounds and all the other interesting artifacts. (functions shared below anyways.)

It's especially bad at vocal data. Where do I go from here? My goal is to do a kind of autotune on vocals. I can probably have the user put on headphones so their voice is isolated at least, but how do I determine what key they are singing in?
Last edited on
I'm leaving dealing with endian-ness to you since by the time you've loaded it into a vector you should have already dealt with it... right?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <vector>
#include <cstdlib>

// If a line were drawn from one short to the other, 
// would it have to cross the x axis?
bool shortsCrossZero(short first, short second)
{
	if((first >> 31) == (second >> 31))
	{
		return false;
	}
	return true;
}

// I'm just worried I'll forget what exactly needs to be divided...
double toHertz(double samplesPerPeriod, double samplesPerSecond)
{
	return samplesPerPeriod / samplesPerSecond;
}

// Let's get the average frequency from a vector of samples
// returns negative if a full period is not encountered.
double guessTone(std::vector<short> samples, std::size_t start, std::size_t length, size_t samplesPerSecond)
{
	size_t vectorLen = samples.size() - 1;

	double count = 0;
	std::size_t firstFound = 0;
	std::size_t lastFound = 0;

	if(vectorLen > start + length)
	{
		for(int i = start; i < start + length && i < vectorLen; i ++)
		{
			if(shortsCrossZero(samples[i], samples[i + 1]))
			{
				if(firstFound == 0)
				{
					firstFound = i - start;
				}
				else
				{
					lastFound = i - start;
				}
				count ++;
			}
			
		}
		if(count > 1)
		{
			return ((double) (count - 1) / 2.0) * (double) samplesPerSecond/(double) (lastFound  - firstFound);
		}
		else
		{
			return -2;
		}
	}
	return -1;
	
}
Last edited on
I've moved this to lounge because it's going to be a long-term project what with midterms coming up. I'll post links to pdfs, tutorials and google searches that I want to read through.

Goal:
Vocal recognition and autotune/key-correction.

Step 1: Resources and terminology
- these links may or may not be directly related to audio processing, but by following their anotated bibliographies we may stumble across enough grains to make a sandcastle.

Current Reading:

http://www.codewithc.com/c-program-for-linear-exponential-curve-fitting/

http://math.stackexchange.com/questions/36725/how-to-fit-a-curve-to-a-sinusoidal-wave

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.4154&rep=rep1&type=pdf

http://www.ismir2004.ismir.net/proceedings/p028-page-138-paper183.pdf

http://researchcommons.waikato.ac.nz/bitstream/handle/10289/5700/thesis.pdf?sequence=3

http://www.ee.columbia.edu/~dpwe/pubs/waspaa01-singing.pdf

https://www.researchgate.net/profile/Christopher_Harte/publication/200806168_Detecting_harmonic_change_in_musical_audio/links/02e7e518cea7de8b42000000/Detecting-harmonic-change-in-musical-audio.pdf

https://www.researchgate.net/publication/230554907_An_efficient_algorithm_for_the_calculation_of_a_constant_Q_transform

https://www.google.com/search?q=DYNAMIC+CHROMA+FEATURE+VECTORS&ie=utf-8&oe=utf-8

http://dl.acm.org/citation.cfm?id=1178727

https://www.google.com/search?q=st+Fourier+trans-+form&ie=utf-8&oe=utf-8#q=fast+Fourier+trans-+form&*

http://learning.eng.cam.ac.uk/pub/Public/Turner/Presentations/gp-audio.pdf

https://books.google.com/books?id=UTbTBwAAQBAJ&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

http://www.eas.uccs.edu/~mwickert/ece2610/lecture_notes/ - slight hack, will be broken if he ever puts an index.html in this folder

https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/04-bookFMP/02-slides/Mueller_FMP_Chapter3.pdf

http://www.cs.bu.edu/~snyder/cs591/Handouts/AudioProgrammingInC.pdf

Last edited on
If it is a continuous note then you should Fourier transform (or FFT) the signal and pick off the dominant frequency. Most instruments, so presumably the human voice, will also have harmonics in (integer multiples of the base frequency) as well. For a mix of frequencies the frequency spectrum is a more reliable guide than trying to pick up the zero crossings.

For changing notes - I'm not sure. Maybe you should FFT some relatively short windows.

What key are they singing in? Well, identify the notes first and find the best comparison with the standard scales, although I'm not sure that this will distinguish major and minor scales and I wouldn't try this approach on a composer like Schoenberg!

Last edited on
Thank you, I came across the term FFT in my search and I wasn't 100% on how closely related it was. I'll keep adding links to the post above, I did find a nice article on fourier for voice detection
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.4154&rep=rep1&type=pdf )
From just skimming through most of these articles I'm seeing functions that I understand about 70% through (currently I'm in calculus 3 and have not taken statistics). A lot of these ideas are also thoroughly discussed on youtube if I can keep myself from getting distracted over there. I'm hoping to buckle down over this spring-break and write the equations out as functions.

I'm looking forward to when I can put all these ideas together.

Some notes about vocals:

http://www.audio-issues.com/music-mixing/5-need-to-know-frequency-areas-of-the-vocal/

Wikipedia
Several little-known works call for pitches higher than G6. For example, the soprano Mado Robin, who was known for her exceptionally high voice, sang a number of compositions created especially to exploit her highest notes, reaching C7.[8] Robin also added a number of her top notes to other arias

http://www.phy.mtu.edu/~suits/notefreqs.html
Last edited on
Note that finding zero-crossings is not enough to measure the frequency of a signal. Consider: f(t) = sin(t * tau) + 2. frequency(f) = 1 Hz, yet there is no real t such that f(t) = 0.

If you just want to measure frequency and do pitch modulation, a DCT should be faster than an FFT, IINM.
Thank you Helios,
I'll look into discrete cosine transforms as well, time permitting;

https://books.google.com/books?hl=en&lr=&id=fWviBQAAQBAJ

https://scholar.google.com/scholar?q=discrete+cosine+transform&hl=en&as_sdt=0&as_vis=1&oi=scholart&sa=X&sqi=2&ved=0ahUKEwiz-snDt9rSAhVKl1QKHZ4eBlUQgQMIGDAA

The reading list is getting a bit too heavy for a single week (from the reading so far I know I'm way out of my league here). I'll try to keep this thread alive and updated as articles pop up. Please keep the recommendations coming.

Feel free to also post lists of recommended links, I'm hoping to also start dropping a function here and there as I work things out.
Last edited on
Topic archived. No new replies allowed.