int diffs = std::inner_product(s1.begin(), s1.end(), s2.begin(), 0,
std::plus<int>(), std::not_equal_to<char>());
std::cout << "The number of pairwise mismatches: " << diffs << '\n';
}
according to the code, how may I add these in the program??
Allow the strings to be of different lengths. In this case, it works as follows:
GCCGTAA
GCCG
These have a Hamming Distance of 3, because the "TAA" at the end is not matched in the 2nd string.
TGGC
TAGCAGG
These have a Hamming Distance of 4: Positions 1,4,5,6 don't match.
This should be done without going "out of bounds" on a string. In other words, you should not be looking at the 5th position in a string with only 4 letters (actually, since we start counting at 0, you should be careful when looking at the 4th position in a string with 4 letters as well). This is because in a string with 4 letters, the length of the string is 4, but the last position with a character is 3 (4 characters are at positions 0,1,2,3). Then, at position 4, there is a null-terminator, which is a null character denoting that a string has ended. If, however, you access position 5, then you are clearly out of bounds of the string, and may be accessing unrelated data.
- Also output the error percentage. This should involve type casting. Your solution should NOT be to just make everything into a float, because certain values clearly should be represented as integers. You must divide 2 integers, and re-type them to get an accurate decimal result. Note that the error percentage is based on the Hamming Distance and the length of the string (or the length of the longer string if the strings are different lengths). You're aspiring computer science majors, so you should be able to figure out how to turn this into a percentage.
- Allow uppercase and lowercase letters to match. For example:
aggct
AGGCT
These are a perfect match (Hamming Distance 0).
TCCGA
tccaa
These have Hamming Distance 1.
- Allow the user to enter the input either through command line arguments or being prompted for input. The way this will work is that if the user enters a command line argument, that will be used as the input. If he does not, he should be prompted for input.
- Error checking/ Data validation: Make sure the input consists of a nucleotide sequence, and not some random junk.
- Make sure all characters are either A, G, C, or T (either uppercase or lowercase).
- When using a command line argument, make sure the user only puts in the correct number of arguments.
// verify that all characters are either A, G, C, or T (either uppercase or lowercase).
bool valid( std::string str )
{
constexprchar allowed_chars[] = "AGCTagct" ;
return str.find_first_not_of(allowed_chars) == std::string::npos ;
}
std::size_t hamming_distance( std::string one, std::string two )
{
assert( valid(one) && valid(two) ) ; // make sure we have valid nucleotide sequences
// convert all chars to upper case
for( char& c : one ) c = std::toupper(c) ;
for( char& c : two ) c = std::toupper(c) ;
// make the string lengths equal by adding null chars at the end
constauto maxsz = std::max( one.size(), two.size() ) ;
one.resize(maxsz) ;
two.resize(maxsz) ;
return std::inner_product( one.begin(), one.end(), two.begin(), 0,
std::plus<int>(), std::not_equal_to<char>() );
}