look for specific letters in a string

closed account (4ybDGNh0)
I am trying to write a code where the user inputs a string of RNA and is told whether a stop codon and start codon exists. In order to "start" the string must have 'AUG' and to stop it needs 'UGA,' 'UAG', or 'UAA'. For some reason my code won't run. Any suggestions?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
 #include <iostream>
#include <string>

using namespace std;
int i;





int main()
{
    string seq1; 
    cout << "Enter a DNA sequence : "<<endl; 
    cin >> seq1;
    
    
    
    
    
    bool start; /
    bool stop;
    
    
    for(decltype(seq1.length()) i = 0; i < seq1.length(); i++)
    {
        switch(seq1[i])
        
        {
            case 'AUG' : 
            {
                if(seq1[i] == 'AUG')
                {
                    
                    start = true;
                }
                else
                {
                    start = false;
                }
                break;
            case 'UAA' : case 'UAG' : case 'UGA' : 
            {
                if(seq1[i] == 'UAA') 
                {
                     
                    stop = true; 
                }
                else
                    if(seq1[i] == 'UAG') 
                    {
                        stop = true; 
                    }
                else
                    if (seq1[i] == 'UGA')
                    {
                        
                    stop = true;
                    }
                else
                {
                    stop = false;
                }
            }
                break;
        }
        
        if (start == true)
        {
            cout<<"Yes, a start codon exists"<<endl;
        }
        else
        {
            cout<<"No, a start codon does not exist"<<endl;
        }
            if ( stop == true)
            {
                cout<<"Yes, a stop codon exists"<<endl;
            }
            else{
                cout<<"No, a stop codon does not exist"<<endl;
            }
    }
Last edited on
case 'AUG' :

This is just wrong. And very wrong.

It would help if you give me details of your assignment so that people can help you further.

P.S : I mean you give us sample output.
Last edited on
closed account (4ybDGNh0)
So say you input an RNA sequence that is: AGAAUGGCGATCGATCGATCGUAACGAGC

I want a code that recognizes that AUG is the same as start
and UAA is the same as stop.

So I want a code that scans the entire string and looks for the letters "AUG", "UAA", "UAG", and "UGA"

Input :AGAAUGGCGATCGATCGATCGUAACGAGC

Output: "Yes there is a start"
"Yes there is a stop"
Last edited on
The following should do what you want.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <string>
#include <iomanip>

using namespace std;

int main()
{
	std::string DNA;
	std::cout << "Enter a DNA sequence : "; std::cin >> DNA;

	std::string codon;
	std::string final_DNA;

	bool bHasStartColon = false;
	bool bHasEndColon = false;

	for(decltype(DNA.size()) i = 0; i < DNA.size() - 3; i++)
	{
		codon = DNA.substr(i, 3);

		// The start colon is encountered

		if(codon.compare("AUG") == 0)
		{
			final_DNA += std::string("AUG");

			bHasStartColon = bool(true);

			// Start extracting until UUA is encountered
			for(i = i + std::string("AUG").size(); i < DNA.size(); i++)
			{
				if(i < (DNA.size() - 3))
				{
					codon = DNA.substr(i, 3);

					if(codon.compare("UUA") == 0)
					{
						final_DNA += "UUA";
						bHasEndColon = bool(true); break;				
					}
				}

				final_DNA += DNA[i];
			}
			break;
		}
	}

	std::cout << "Has start colon : " << std::boolalpha << bHasStartColon << std::endl;
	std::cout << "Has end colon : " << std::boolalpha << bHasEndColon << std::endl;
	std::cout << "Extracted DNA sequence : " << final_DNA << std::endl;

	return 0;
}


Enter a DNA sequence : AGAAUGGCGATCGATCGUUACGACG
Has start colon : true
Has end colon : true
Extracted DNA sequence : AUGGCGATCGATCGUUA


http://cpp.sh/2mgta
The following should do what you want.

From simple inspection, it doesn't appear to. A stop codon is not detected if it occurs before a start codon or if a start codon is not detected. That may or may not be reasonable given the ambiguity of the OP's sample output. However, only 1 out of 3 possible stop codons is detected which clearly doesn't do what he wants.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <string>
#include <vector>

// returns true if any string in list is present in text.
bool is_in(const std::string& text, const std::vector<std::string>& list) {
    for (const auto& element : list)
        if (text.find(element) != std::string::npos) return true;

    return false;
}

int main() {
    std::vector<std::string> start_codons = { "AUG" };
    std::vector<std::string> stop_codons = { "UGA", "UAG", "UAA" };

    std::string sequence;

    std::cout << "Enter a DNA sequence:\n> ";
    std::cin >> sequence;

    if (is_in(sequence, start_codons))
        std::cout << "Start codon is present.\n";
    else std::cout << "Start codon is not present.\n";

    if (is_in(sequence, stop_codons))
        std::cout << "Stop codon is present.\n";
    else std::cout << "Stop codon is not present.\n";
}

So I even extract the RNA sequence as a bonus. If you just see if a certain start colon or a stop colon is there in the sequence that would be very easy.
So I even extract the RNA sequence as a bonus.

Which sequence do you extract if only a stop codon is present? If there is more than one sequence present? If there is more than one stop codon present after a single start codon? If two start codons precede any number of stop codons?

It seems like your reach may be exceeding your grasp as far as your bonus goes.

codon, not colon
Which sequence do you extract if only a stop codon is present?

None.

If there is more than one sequence present?

Probably need a vector.

If there is more than one stop codon present after a single start codon?

The rest is ignored after a stop codon is found.

If two start codons precede any number of stop codons?

Could be a biology error.

Disclaimer : I don't know much about biology, thanks for explaining there are three stop codons not just one.
Using std::regex:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::regex start("(AUG)");
    std::regex stop("(UGA)|(UAG)|(UAA)");
    std::string input = "AGAAUGGCGATCGATCGATCGUAACGAGC";

    std::size_t start_count {};
    std::sregex_iterator itr_start(input.begin(), input.end(), start);
    std::sregex_iterator itr_start_end;
    for (; itr_start != itr_start_end; ++itr_start)
    {
        if(std::cout << itr_start ->str() << " " << itr_start -> position() << '\n')
        {
            ++start_count;
        }
    }
    if(!start_count)std::cout << "No matches for start \n";

    std::size_t stop_count {};
    std::sregex_iterator itr_stop(input.begin(), input.end(), stop);
    std::sregex_iterator itr_stop_end;
    for (; itr_stop != itr_stop_end; ++itr_stop)
    {
        if(std::cout << itr_stop ->str() << " " << itr_stop -> position() << '\n')
        {
            ++stop_count;
        }
    }
    if(!stop_count)std::cout << "No matches for stop \n";
}

OP: I'm not sure how familiar you are with std::regex, so I'm not providing a step by step breakdown of the program. You already have a perfectly good solution to your problem but in case you're interested in exploring std::regex for this or future problems you can start here: http://www.cplusplus.com/reference/regex/ECMAScript/ and then wherever google takes you and finally come back here if something is still unclear
Using std::regex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>
#include <string>
#include <regex>
#include <iomanip>

int main ()
{
    const std::string rna = "AGAAUGGCGATCGATCGATCGUAACGAGC" ;
    std::cout << "rna: " << std::quoted(rna) << '\n' ;

    // [ACGTU]*? - non-greedy match for string between start/stop
    const std::regex re( "[ACGTU]*(AUG)([ACGTU]*?)(UGA|UAG|UAA)[ACGTU]*" ) ;

    std::smatch match ;
    if( std::regex_match( rna, match, re ) )
    {
        std::cout << "start: at " << match.position(1) << " found " << std::quoted( match[1].str() ) << '\n'
                  << "  end: at " << match.position(3) << " found " << std::quoted( match[3].str() ) << '\n'
                  << "sequence between start and stop: " << std::quoted( match[2].str() ) << '\n' ;
    }

    else std::cout << "invalid rna or no start-stop sequence\n" ;
}

http://coliru.stacked-crooked.com/a/19f59f55ba025bde
http://rextester.com/JPSDEQ88022
Topic archived. No new replies allowed.