Using Fstream in C++

I'm looking to write a program in C/C++ to traverse a Fasta file formatted like:

>ID and header information
SEQUENCE1
>ID and header information
SEQUENCE2
SEQUENCE2CONTINUED

and so on

I would like to pre-populate a list of all of the SEQUENCEs. How would I skip lines using fstream to accomplish this? Would I use > to break a read-in?

Any advice in general/points in the right direction for using fstream to read in a file like this?
Something like the following (where in is replaced with a std::ifstream.):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// http://ideone.com/g1IX6H
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

const std::string data_text =
R"(>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
)";

std::istringstream in(data_text);

int main()
{
    std::string line;
    std::vector<std::string> sequences;
    bool in_sequence = false;

    while (getline(in, line))
    {
        if (in_sequence)
        {
            if (line.size() && line.front() != '>')
                sequences.back() += line;
            else if (line.front() == '>')
                in_sequence = false;
        }
        else if (line.size() && line.front() != '>')
        {
            sequences.push_back(line);
            in_sequence = true;
        }
    }

    for (auto seq : sequences)
        std::cout << seq << "\n\n";
}
Do you mean that you want to read entire lines at once? You can use "getline" for that.

1
2
3
std::getline(std::ifstream inFile, std::string myString);
//do whatever we want here
//remember to close the streams 


As far as extracting the information you want out of it, you will have to do the parsing yourself.
Full question was here: http://www.cplusplus.com/forum/general/150561/

I want to read the entire line in with the > and write that to another file. Then, I need to take the whole sequence under that line (can be one or more lines) and compare it to all other sequences I have (list will be in the thousands). If that sequence is unique (not a subset of any other one), then it is written in the output file under its header. If not, it is discarded.

It seems like it would be easier to write all of the sequences to a data structure to traverse as you go down and write the headers you would also check the sequence that corresponds with it against that data structure. Otherwise, it would be a pain to compare that sequence to everything else in the file with the headers included?
Topic archived. No new replies allowed.