This is possibly more of an algorithm question rather than c++ related.
Suppose I have two vectors:
1 2
std::vector<double> first = {1.0,2.01,3.05,4.05};
std::vector<double> second = {1,2,3,3,4}; // very similar except no decimals and a second "3"
And now I want an algorithm that will tell me how similar these two vectors are. That is, I want to know how much of first is similar to second. An appropriate answer in this case might be about 80%, as only 1 out of 5 elements would have to be removed (second[2]) from second to make it almost identical to first.
Are you aware of any (preferably fast) methods of achieving this? I'm open to suggestions for suggestions for alternatives.
Suppose I have two "cells", each composed of a series of genes, which in turn are really just sets of numbers, called codons (they're basically double floats). I want to give each cell a number, or metric which indicates the degree of similarity between its genome and that of another. The problem is that while two codons might be very similar, they're not necessarily the same (3.01 and 3.02 similar, but not the same).
I'm not sure levenshtein distance will help, from what I can tell it's just for discrete sets such as integers and letters, but I think it's a start, so I'll do some more reading :)
again you need to define 'similar' for yourself.
You said this:
3.01 and 3.02 similar, but not the same
but you *could* argue they are not similar at all. That they are completely different numbers.
You define a tolerance perhaps so if the float falls within this tolerance range away from your integer you could class them as 'similar'. Or if the ratio of the float and int is approaching one, you could class this as similar?
I suppose I could quantify the degree of similarity in terms of the following:
1) if the integer component (ie the cast to int result) of any two floats is the same, then the "difference" between these two numbers shall be the fractional difference between them. So, if we compare "3.1" and "3.2" the difference between these should be 10% of the difference that the Levenshtein Distance would be for two different integers.
2) Any number that is more than an integer apart should be considered "different" as per the levenshtein distance.