Counting letters in a sequence.

Apr 23, 2011 at 6:39pm
I was given an assignment to write a program that takes the string

"ATGCTAGTATTTGGATAGATAGATAGATAGATAGATAGATAAAAAAATTTTTTTT "

and count how many times "TT" occurs without counting the same characters twice.
the answer is 5. But my program keeps coming up with 9.

this is what i have. so far.

any ideas on how to fix this problem?


///////////////////////////////////////////////////
#include <cstdlib>
#include <iostream>

using namespace std;

int overlap( char *ptr1 , char *ptr2 );
int pattern(char *ptr3,char *ptr4);

int main()
{
char strg1[] = "To those who do not know mathematics it is difficult to get across a real feeling as to the beauty, the deepest beauty, of nature. If you want to learn about nature, to appreciate nature, it is necessary to understand the language that she speaks in.", strg2[]= "nature";
char *ptr1(strg1), *ptr2(strg2);
overlap(ptr1,ptr2);


char strg3[]="ATGCTAGTATTTGGATAGATAGATAGATAGATAGATAGATAAAAAAATTTTTTTT",strg4[]="TT";
char *ptr3(strg3),*ptr4(strg4);
pattern(ptr3,ptr4);
system("PAUSE");
return 0;
}
int overlap( char *ptr1 , char *ptr2 )
{int count(0);
while ((ptr1=strstr(ptr1,ptr2)) !=NULL)

{
count ++;
ptr1 ++;
}
cout << "The word '" << ptr2 << "' appears "
<< count << " times in the string." << endl;}

int pattern(char *ptr3,char *ptr4)
{int count2(0);
while ((ptr3=strstr(ptr3,ptr4))!=NULL)
{
count2++;
ptr3++;
}
cout<<endl<<endl;

cout<< "The sequence '"<<ptr4<<"' appears "<<
count2<< " times in the DNA sequence."<<endl;
}

////////////////////////////////////////////////////////////
Apr 23, 2011 at 6:46pm
You are counting the second T in a pair as the first T in the following pair; in essence, you are counting the same letter twice in some cases.

When you find a TT pair, add to your count, and then move the pointer keeping track of where you are in the string forwards another element, so that you don't count the second T as the first T in a new pair.
Apr 23, 2011 at 6:49pm
Can't you use std::string? Anyway, I believe the problem is you're always advancing your index by one (i.e. ptr1++ or ptr3++) instead of advancing it by the length of the substring you're looking for so you won't count the same characters twice.
Apr 23, 2011 at 9:27pm
hmm.. I am just having so many issues figuring this out.
Apr 23, 2011 at 9:51pm
The crux of your problem is that when you find a match, you need to advance beyond that match and then keep looking.

Accordingly, the advance you make when you've found a match should not be of size one character, but of however many characters in the match.

So, ptr3++ advances only one letter, but TT is of size two, so you need to advance two letters.

ptr3 = ptr3 + 2;

A nicer way of doing that is to let the code figure out how far to advance. The cstring function strlen returns the number of characters in a char array, so you could use

ptr3 = ptr3 + strlen(ptr4);

as ptr4 is the word you're trying to match, so that is how many letters you need to advance to make sure you don't try to match any letters you've already decided are a match.


Your functions overlap and pattern are almost identical, changing only in minor text put to screen. The idea of functions is to save having to write out the same code repeatedly. Having two identical functions seems a bit silly. :)
Last edited on Apr 23, 2011 at 9:58pm
Topic archived. No new replies allowed.