For example, BsaXI is the acronym.
The sequences are:
ACNNNNNCTCCNNNNNNNNNN'
NNN'NNNNNNNGGAGNNNNNGT
GGAGNNNNNGTNNNNNNNNNNNN'
NNN'NNNNNNNNNACNNNNNCTCC
Here's what I'm attempting to do, read up to the first '/' character, it'll be the acronym. I'll get the line, and get the substring up to '/' as the recognition sequence for the acronym. I keep parsing through the string and insert it into my data structure.
ifstream theFile;
theFile.open(db_filename);
if(theFile.is_open()){
string aFileLine;
while(getline(theFile, aFileLine)){
// Sets Enymze Acronym
string anEnzymeAcronym = aFileLine.substr(0,aFileLine.find('/'));
string aRegoSequence;
// Keeps track of the starting position of the Recognition Sequence
int tracker = 0;
int wholeStringLength = aFileLine.length();
string test= "";
// While the tracker is not up to the 2nd last '/'
while(tracker != wholeStringLength - 2){
string a = aFileLine.substr(tracker, aFileLine.find('/'));
// Updates tracker position to read the next recognition sequence
tracker += a.length()+2;
aRegoSequence = aFileLine.substr(tracker, aFileLine.find('/'));
// Creates a SequenceMap object with the recognization sequence and enzyme acronym
SequenceMap aSequenceMap(aRegoSequence, anEnzymeAcronym);
// Inserts the sequence and acronym into the tree
a_tree.insert(aSequenceMap);
}
}
}
else{
cout << "No file exists!\n";
}
Since you don't post your whole code, I can only give you suggestions how to do it.
Prepare a vector to store the recognition sequences.
1. Read each line with std::getline()
2. Import the line you have read with a std::istringstream.
3. Do a std::getline() with delimiter '/' to get the enzyme name.
4. If the enzyme is really the acronym you need, then :
4.1. Continually do a std::getline() with delimiter '/' to get the recognition sequences, then push them into the vector.
If you follow these instructions closely you will never fail. Be sure to know what a std::istringstream is.
I was planning to use to std::getline with a '/' as the delimiter, but I don't know how I can skip the enzyme acronym because if i do something like
getline(theFile, aRegoSequence, '/');
It's going to read the first '/' which would be the enzyme acronym, so I opted to make a counter and parse through the string and keep track of the position. I don't think storing it into a vector is necessary, since I created a sequence map with the sequence and the acronym and used my implemented insert function to place it into my tree.
If you follow these instructions closely you will never fail.
I am implying that do you know std::istringstream? When you read a whole line with std::getline() pass the string to a std::istringstream then let it do the job.
getline(theFile, aRegoSequence, '/');
I never tell you to do it. I want you to read a whole line then pass it to std::istringstream, then you can use std::getline() with delimiter '/'.
For your domain, it would be worthwhile to learn to use the regular expressions library. http://en.cppreference.com/w/cpp/regex
It would come in very handy, over and over again.
Here's an example of using regular expressions to do this particular task:
#include <iostream>
#include <regex>
#include <string>
#include <map>
#include <set>
#include <sstream>
// rec_seq_map_type maps an acronym (key) to a set of all its recognition sequences
using rec_seq_map_type = std::map< std::string, std::set<std::string> > ;
rec_seq_map_type get_rec_seq_from( std::istream& stm )
{
rec_seq_map_type map ;
std::string line ;
while( std::getline( stm, line ) )
{
// parse the line into / delimited tokens
// a token is a sequence of one or more characters other than /
const std::regex re( "[^/]+" ) ;
std::sregex_iterator iter( line.begin(), line.end(), re ), end ;
if( iter != end ) // if there is at least one match
{
// the first token, the key
auto& set = map[ iter->str() ] ; // reference to the set associated with this key
// the remaining tokens are the recognition sequences; insert them into the set
for( ++iter ; iter != end ; ++iter ) set.insert( iter->str() ) ;
}
}
return map ;
}
int main()
{
std::istringstream file(
"BsaJI/C'CNNGG//\n""BsaWI/W'CCGGW//\n""BsaXI/ACNNNNNCTCCNNNNNNNNNN'/NNN'NNNNNNNGGAGNNNNNGT//\n""BsaXI/GGAGNNNNNGTNNNNNNNNNNNN'/NNN'NNNNNNNNNACNNNNNCTCC//\n""BsbI/CAACACNNNNNNNNNNNNNNNNNNNNN'/NN'NNNNNNNNNNNNNNNNNNNGTGTTG//\n""Bsc4I/CCNNNNN'NNGG//\n""BscAI/GCATCNNNN'NN/'NNNNNNGATGC//\n""BscGI/CCCGT/ACGGG//\n""Bse1I/ACTGGN'/NC'CAGT//\n" ) ;
constauto rec_seq_map = get_rec_seq_from(file) ;
for( constauto& pair : rec_seq_map )
{
std::cout << "recognition sequences for acronym: " << pair.first << '\n' ;
for( constauto& str : pair.second ) std::cout << str << '\n' ;
std::cout << '\n' ;
}
}
ifstream theFile;
theFile.open(db_filename);
if(theFile.is_open()){
string aFileLine, randomLine;
// Skip the first 10 lines of the files
for(int i = 0; i < 10; i++){
getline(theFile, randomLine, '\n');
}
while(getline(theFile, aFileLine)){
// Sets Enymze Acronym
string anEnzymeAcronym = aFileLine.substr(0,aFileLine.find('/'));
string aRegoSequence;
// Keeps track of the starting position of the Recognition Sequence
int tracker = anEnzymeAcronym.length() +1;
// While the tracker is not up to the 2nd last '/'
while(tracker != aFileLine.length() - 1){
string remainingString = aFileLine.substr(tracker);
aRegoSequence = aFileLine.substr(tracker, remainingString.find('/'));
// Updates tracker position to read the next recognition sequence
tracker += aRegoSequence.length()+1;
// Creates a SequenceMap object with the recognization sequence and enzyme acronym
SequenceMap aSequenceMap(aRegoSequence, anEnzymeAcronym);
// Inserts the sequence and acronym into the tree
a_tree.insert(aSequenceMap);
}
}
}
else{
cout << "No file exists!\n";
}
I'm not sure why this doesn't work, because I think my logic is correct. I tried using the same logic in cpp shell and it works perfectly.