Counting of repititive bases in the Genome

can you modify this C++ code so that I can count the no. of TTTT and GGGG.

Please keep in mind that the input file contains billion characters of A, T, G, C. So the code must be an efficient one.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{
char ch;
int countA=0, countT=0, countG=0, countC=0, total;
ifstream myfile ("file.txt");

if (myfile.is_open())
{
while (myfile.get(ch))
{
if(ch == 'A')
countA++;
if(ch == 'T')
countT++;
if(ch == 'G')
countG++;
if(ch == 'C')
countC++;
}
myfile.close();

total = countA+countT+countG+countC;
cout<<"\n Total no. of bases = "<< total;
cout<<"\n Total no. of A's = "<<countA;
cout<<"\n Percentage = "<<countA*100/total;
cout<<"\n Total no. of T's = "<<countT;
cout<<"\n Percentage = "<<countT*100/total;
cout<<"\n Total no. of G's = "<<countG;
cout<<"\n Percentage = "<<countG*100/total;
cout<<"\n Total no. of C's = "<<countC;
cout<<"\n Percentage = "<<countC*100/total;

}
else cout << "Unable to open file";
return 0;
}
Last edited on
So what problem are you having ?

Please use code tags, look here for directions. http://www.cplusplus.com/articles/z13hAqkS/
For speed, avoid using the formatted input options -- instead use the underlying stream if directly.

Btw, your q is a little misleading. I thought you were looking for the number of times you have continuous subsequences of the same nucleobase, like AA+.

Hope this helps.
Topic archived. No new replies allowed.