I'm looking to write a program in C/C++ to traverse a Fasta file formatted like:
>ID and header information
SEQUENCE1
>ID and header information
SEQUENCE2
SEQUENCE2CONTINUED
and so on
I would like to pre-populate a list of all of the SEQUENCEs. How would I skip lines using fstream to accomplish this? Would I use > to break a read-in?
Any advice in general/points in the right direction for using fstream to read in a file like this?
I want to read the entire line in with the > and write that to another file. Then, I need to take the whole sequence under that line (can be one or more lines) and compare it to all other sequences I have (list will be in the thousands). If that sequence is unique (not a subset of any other one), then it is written in the output file under its header. If not, it is discarded.
It seems like it would be easier to write all of the sequences to a data structure to traverse as you go down and write the headers you would also check the sequence that corresponds with it against that data structure. Otherwise, it would be a pain to compare that sequence to everything else in the file with the headers included?