Hi all,
Complete newbie here, I am a statistician trying to write a program to process a very large data file. I'd be very grateful for advice from any of you proper programmers! Full code is shown at the end of the post.
I am trying to get some text input (gene names and snp names) from a couple of different files and store the information in some data structures (strings, I think) so that I can later compare input from another very large input file to these (still to be coded!). I think strings would be handy so I can use strcmp(). My problem is handling the input from my ifstream object. At first I tried the >> operator but that doesn't seem to work with my ifstream object (see commented out code & error message). I'm not sure why this is as this page
http://www.cplusplus.com/reference/iostream/ifstream/ seems to suggest it should. I then decided to used getline and access the characters I needed. Getline works to a point - I seem to be able to put the whole input line into a string (see first bit of code using inFile1), but I have problems when I am trying to break it up and store it in different structures (second bit using inFile2). I think I'm using the indeces wrongly. I expected the cout statements to show a nice list of gene names (first loop) and snp names (second loop). Instead the first cout statement produces:
N
S
G
0
0
0
0
0
2
1
5
7
7
8
\342
\217
repeated over and over again. 'ENSG00000215778' is the name of the last gene in the file (note dropped 'E'). The second produces
\342
\217
\342
\217
repeated many times. I've tried changing the indexing around but I can't get anything more sensible.
Any help with either improving this code or trying a different approach would be very gratefully received.
Thanks,
Jen
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main (int argc, char * const argv[]) {
const int MAXLENGTH = 70;
const int MAXCHARS = 67;
const int NGENES = 1165;
const int SNAMELENGTH = 13;
const int GNAMELENGTH = 16;
const int NSNPS = 15010;
ifstream inFile1;
ifstream inFile2;
int i=0; //counts lines in infiles
int j=0;
int k=0;
char line[MAXCHARS]; //holds whole line of input to be split into 3 vars
char gene2snp[MAXLENGTH] = "gene2snps_sorted.txt";
char genelist[MAXLENGTH] = "genelist.txt";
string genes[NGENES]; //array of strings holding sorted gene names
string g2ssnp[NSNPS][SNAMELENGTH]; //array of strings holding sorted snp names
string g2sgene[NSNPS][GNAMELENGTH]; //array of strings holding sorted gene names associated with g2ssnp[]
//read in list of gene names to genes[]
inFile1.open(genelist, ios::in);
if (inFile1.fail()) //check for successful open
{
std::cout<<"\nThe file was not successfully opened. Please check it exists." << endl;
exit(1);
}
while (!inFile1.eof()) //check for end of file
{
// inFile1 >> genes;*this commented-out code doesn’t work
inFile1.getline(line, GNAMELENGTH,'\n');
genes[i] = line;
i++;
}
inFile1.close();
cout << genes[0] << '\n' << genes[10];
i=0;
//open gene2snp list for reading in
inFile2.open(gene2snp, ios::in);
if (inFile2.fail()) //check for successful open
{
std::cout<<"\nThe file was not successfully opened. Please check it exists." << endl;
exit(1);
}
while (!inFile2.eof()) //check for end of file
{
// inFile2 >> g2sgene >> g2ssnp; doesn’t work - error message 'error: no match for 'operator>>' in 'inFile2' >> g2sgene'
inFile2.getline(line, GNAMELENGTH,'\n');
{
//put chars 0-15 and 17-28 into g2sgene and g2ssnp
for(i=0;i<15;i++)
{
g2sgene[i][j]=line[i];
cout << g2sgene[i][j]<<'\n';
}
for(i=17;i<27;i++)
{
g2ssnp[k][i-17]=line[i];
cout << g2ssnp[k][i-17] << '\n';
}
j++; k++;
}
}
inFile2.close();
}