So I have written a program that checks for the frequency of signed 16bit raw audio by counting the number of times the data crosses the x axis. It's pretty accurate for computer generated sine waves, but that type of data is free of static and multiplier sounds and all the other interesting artifacts. (functions shared below anyways.)
It's especially bad at vocal data. Where do I go from here? My goal is to do a kind of autotune on vocals. I can probably have the user put on headphones so their voice is isolated at least, but how do I determine what key they are singing in?
#include <vector>
#include <cstdlib>
// If a line were drawn from one short to the other,
// would it have to cross the x axis?
bool shortsCrossZero(short first, short second)
{
if((first >> 31) == (second >> 31))
{
returnfalse;
}
returntrue;
}
// I'm just worried I'll forget what exactly needs to be divided...
double toHertz(double samplesPerPeriod, double samplesPerSecond)
{
return samplesPerPeriod / samplesPerSecond;
}
// Let's get the average frequency from a vector of samples
// returns negative if a full period is not encountered.
double guessTone(std::vector<short> samples, std::size_t start, std::size_t length, size_t samplesPerSecond)
{
size_t vectorLen = samples.size() - 1;
double count = 0;
std::size_t firstFound = 0;
std::size_t lastFound = 0;
if(vectorLen > start + length)
{
for(int i = start; i < start + length && i < vectorLen; i ++)
{
if(shortsCrossZero(samples[i], samples[i + 1]))
{
if(firstFound == 0)
{
firstFound = i - start;
}
else
{
lastFound = i - start;
}
count ++;
}
}
if(count > 1)
{
return ((double) (count - 1) / 2.0) * (double) samplesPerSecond/(double) (lastFound - firstFound);
}
else
{
return -2;
}
}
return -1;
}
I've moved this to lounge because it's going to be a long-term project what with midterms coming up. I'll post links to pdfs, tutorials and google searches that I want to read through.
Goal:
Vocal recognition and autotune/key-correction.
Step 1: Resources and terminology
- these links may or may not be directly related to audio processing, but by following their anotated bibliographies we may stumble across enough grains to make a sandcastle.
If it is a continuous note then you should Fourier transform (or FFT) the signal and pick off the dominant frequency. Most instruments, so presumably the human voice, will also have harmonics in (integer multiples of the base frequency) as well. For a mix of frequencies the frequency spectrum is a more reliable guide than trying to pick up the zero crossings.
For changing notes - I'm not sure. Maybe you should FFT some relatively short windows.
What key are they singing in? Well, identify the notes first and find the best comparison with the standard scales, although I'm not sure that this will distinguish major and minor scales and I wouldn't try this approach on a composer like Schoenberg!
Thank you, I came across the term FFT in my search and I wasn't 100% on how closely related it was. I'll keep adding links to the post above, I did find a nice article on fourier for voice detection
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.4154&rep=rep1&type=pdf )
From just skimming through most of these articles I'm seeing functions that I understand about 70% through (currently I'm in calculus 3 and have not taken statistics). A lot of these ideas are also thoroughly discussed on youtube if I can keep myself from getting distracted over there. I'm hoping to buckle down over this spring-break and write the equations out as functions.
I'm looking forward to when I can put all these ideas together.
Several little-known works call for pitches higher than G6. For example, the soprano Mado Robin, who was known for her exceptionally high voice, sang a number of compositions created especially to exploit her highest notes, reaching C7.[8] Robin also added a number of her top notes to other arias
Note that finding zero-crossings is not enough to measure the frequency of a signal. Consider: f(t) = sin(t * tau) + 2. frequency(f) = 1 Hz, yet there is no real t such that f(t) = 0.
If you just want to measure frequency and do pitch modulation, a DCT should be faster than an FFT, IINM.
The reading list is getting a bit too heavy for a single week (from the reading so far I know I'm way out of my league here). I'll try to keep this thread alive and updated as articles pop up. Please keep the recommendations coming.
Feel free to also post lists of recommended links, I'm hoping to also start dropping a function here and there as I work things out.
I'm hoping that as I go through these articles and books I can start narrowing down better sources. A full day feels wasted at this point, the shorter articles expect the reader to be an expert and usually end up leaving more questions than answers. How-To's and back-to-basics;