The code below is meant to put the first column of an input file into array gene1[] and the second column into gene2[]. I'm reading line-by-line, searching for a space (the gene names are variable length) and using that position to chop the string with line.substr(). The cout statements will not be needed but I've put them in to try to understand what's happening, and it seems that the string is not being split - maybe find_first_of() is not locating a space? The values output for gene1[i] and gene2[i] are the entire string (e.g. line). Any advice appreciated.
1 2 3 4 5 6 7 8 9 10 11 12 13
size_t spacelocation2;//variable to hold location of space in infile1
while (!getline(infile1, line).eof()) //check for end of file
{
cout<<"The first line is - "<<line<<endl;
spacelocation2 = line.find_first_of(" ", 0);//Find space location
gene1[i] = line.substr(0, spacelocation2-1);
cout<< "First gene is " << gene1[i];
gene2[i] = line.substr(spacelocation2, line.length()-spacelocation2);
cout<<" and second gene is "<< gene2[i]<<endl;
i++;
}
I recommend providing a few more clues. Can you type an example of some values that you are seeing in the debugger? I assume that you are stepping into the code with the debugger. It should be fairly simple to debug if you are using a debugging GUI. When you get inside the while loop, what is the value of the line string? What is the location that is returned and stored into spacelocation2? I don't see anything wrong with the substr calls but if a space wasn't found, it seems to me that you would have major problems (undefined behavior). string::npos is returned when space isn't found. If the substr calls are made with npos, I'm not sure how you would end up with the entire string in both gene arrays. the bottom line is that you need to be testing the return value of find_first_of prior to using it. You are assuming that a space will always be found. In the find_first_of call, the second arg is optional. If you are searching from the beginning, the default parameter is set to 0 already so you don't need to specify that second parameter.
Thanks for your pointers. I left out the 0 parameter and added a test for the spacelocation to make sure it's valid. The problem was that I trusted the data provider and didn't check the input file - it was tab delimited, not space. Changed the search to "\t" and with a bit of tweaking all is well.
Thanks for your help, sorry it was such a silly error - I really should have found that myself!