c++ Count the number of occurrences of sequences of N (acquired through user input) or more consecutive 'T's

Count the number of occurrences of sequences of N (acquired through user input) or more consecutive 'T's in a string consisting of the characters a,c,g, and t, and report this as a whole number. For example, actttaattttactttcctta has 3 poly-t sequences of length 3 or more, and only one of length 4 or more.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <cstdlib>
#include <iostream>
#include <cstring>

using namespace std;

int CountOccurrenceN(const string &s , int n, char key){
    int count=0;
    size_t pos=0;
    string p(n,key); 
    do{
        p=s.find(p,pos);
        if(p==-1)break;
        pos += n+1;
        count++;
    }while(true);
    return count;
}
You can use standard algorithm std::search_n. Let consider your example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const char a[] = "actttaattttactttcctta";
const char c = 't';

int count = 0;
auto pos = a;
auto end_pos = a + std::strlen( a );

int size;

std::cout << "Enter size of the subsequence: ";
std::cin >> size;

do
{
   auto start_pos = pos;
   pos = std::search_n( start_pos, end_pos, size, c );
   if ( pos != end_pos )
   {
      count++;
      std::advance( pos, size );
   }
} while ( pos != end_pos );

std::cout << "count = " << count << std::endl;


After marking some changes I hope that the code will work.

The code can be simplified by removing of variable pos.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const char a[] = "actttaattttactttcctta";
const char c = 't';

int count = 0;
auto start_pos = a;
auto end_pos   = a + std::strlen( a );

int size;

std::cout << "Enter size of the subsequence: ";
std::cin >> size;

do
{
   start_pos = std::search_n( start_pos, end_pos, size, c );
   if ( start_pos != end_pos )
   {
      count++;
      std::advance( start_pos, size );
   }
} while ( start_pos != end_pos );

std::cout << "count = " << count << std::endl;

Last edited on
@Vins3Xtreme

I'd like to note that in my opinion the statement

pos += n+1;

is incorrect. Shall be

pos += n;
Last edited on
Nice. Looks like a perfect case for a Scala one-liner:

 
"t+".r.findAllIn("actttaattttactttcctta").filter(_.length >= 3).size



Or even shorter:

 
"t+{3,}".r.findAllIn("actttaattttactttcctta").size
Last edited on
It seems that I suggested incorrect method!:)

Let consider a string that contains 6 't':

sttd::string s = "tttttt";

And we are going to count the number of occurences of sequences of 3 't'. In fact the original string does not contain any sequence of 3 't'. It contains only one sequence of 6 't'. Meantime my code returns 2 sequences of 3 't'!

So some other method is required.
It's not all that much longer in C++
1
2
3
string in = "actttaattttactttcctta";
cout << distance(sregex_iterator(in.begin(), in.end(), regex("t{3,}")),
                 sregex_iterator()) << '\n';
Last edited on
thanks everyone
@Cubbi

Bravo. I like it.
Topic archived. No new replies allowed.