Data Storage Help

I am writing a program that will deal with several different files of different lengths but with the same format. I need to read through each line first to find a maximum value for a line and then I need to reprint each line with one modification as an output file. I have a working program but it is cumbersome and difficult to follow. One problem is that it reads the input file from start to finish twice, once to find the maximum and then again to write the output file with the slight modification.

What would be a good way to store each line from the input file in memory so that I can access it again once I am ready to write the output file? I have looked into arrays but I have two main problems with them, first the number of lines is not the same for each file so the size of the array varies, and second that some of the files contain over 500 lines. I tried to create an array with 100 cells but that did not work.

I have considered creating an object which stores each line as a public property which can then be easily accessed at a later time. But this would result in an object with 500 or more properties. Is this an effective solution?

I have also looked into vectors though I must admit I was unable to follow the tutorial here at cplusplus.com well enough to implement them. Are they the solution that I should seek so that I should invest some more time in understanding their implementation?

If a "line" is just sequence of characters std::string is a good way to store a line. You can use std::vector<string> (vector of strings) to store all the lines. To add a line to the vector you use the push_back member function of std::vector. Adding all lines to the vector will look something like this
1
2
3
4
5
6
7
std::ifstream inputFile("filename");
std::string line;
std::vector<std::string> lines;
while (std::getline(inputFile, line))
{
	lines.push_back(line);
}
Last edited on
Would using a vector and storying 500-1000 lines (30 characters each) be too much? Would I have a memory or performance problems?
Last edited on
not really. you realize that programs like word can have 20 page essays on a few KB. and itunes runs about the same. you will not use up your RAM making that vector. also remember that you should start getting picky about how to sort them because that will start to take up time.
alright I've got a problem now . . . I have some code that compiles but returns a segfault11. I have tracked it down a bit and I believe it has something to do with assigning values to the two dimentional vector. I don't think the issue lies in where I actually assign values to it but instead with being able to access the file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
//Create the vector
std::vector< std::vector<std::string> > vect;

//Open the file
std::ifstream infile;
inf="file.csv";
infile.open(inf.c_str());
if (!infile.is_open()){
  cout << "Unable to open "+inf << endl;
}

//Read through the file to set the length of the vector
int n=0;
while(infile.good()){
  getline(infile,inf);
  n++;
}
vect.resize(n);
infile.seekg(0,ios::beg);

//Writing to the vector
int m = 0; //Counts the rows
//n counts the columns                                                                                           
                                                                                              
string temp;
int delim;
int test;
   
while (infile.good()){
  n=0;
  getline(infile,temp);
  test = temp.size();
  
  if (test=0) {
    return;
  }
  delim=temp.find_first_of('.');
  vect[m][n]=temp.substr(0,delim);
    
  n++;
  //Fills out the columns of the current row
                                                                               
  while(n<6) {
    vect[m][n]=temp.substr(delim+1,temp.find_first_of(',',delim+1)-delim-1);
    delim=temp.find_first_of(',',delim+1);
    n++;
  }
  m++;
}
infile.close();


at the end I know the values were not assigned properly because the following returns a value of 0;

 
cout << vect[0].size();


and the following returns a segfault

 
cout << vect[0][0] << endl;
try
vect[m].push_back(temp.substr(delim+1,temp.find_first_of(',',delim+1)-delim-1));
because the 2nd dimension of the vector was never set.
also you can change this so something easier that will take less memory.
1
2
3
std::ifstream infile;
inf="file.csv";
infile.open(inf.c_str());
1) Do you really need a multidimension vector?
2) Do not read the entire file to just find out how many lines are needed to size the vector. You already stated 500 (or was it 1000 lines) go ahead and reserve your vector for 1200 (or something arbitrarily close). In the first loop just push_back each line.
I'll try and include those improvements but I have found a further difficulty. If you add a cout line at line 30 from the code above the cout is never reached. I know the file is open but it is almost as if when I read through the file to check its length the file could not be read again from the beginning.

And the reason I want to use a vector and set its size dynamically is because I cannot be sure what the length of future files I will be dealing with will be. I would rather write something now that can adapt to larger files without need for rewriting.
1)And the reason I want to use a vector and set its size dynamically is because I cannot be sure what the length of future files I will be dealing with will be. I would rather write something now that can adapt to larger files without need for rewriting. Here is a little tip, I/O operations are always going to be the bottleneck when it comes to efficiency and dynamically allocating memory. Reading the file contents once to count is not good practice, which is why I am trying to steer you in the right track. Right now you have a small data set. What if you file was 40GB, would you really read it twice, no, you wouldn't even be able to have it cached all at once. You can reserve much more, why not reserve 5000, if files get larger than that then it will allocate for you, it's still not going to be the bottleneck. You don't even need to reserve, just push_back will be much more efficient then reading twice.

2)
If you add a cout line at line 30 from the code above the cout is never reached.
It wont hit this line because you file is in a state that is not good (because you've already read the entire contents). Sure you seek to the beginning but that doesn't change the state, you have to clear the state to have it read again (having to jump through these hoops to read it again should be red flags).

3) If you truly want to size the vector and you are in control of the files, you could always make the first line in the file have the count of lines in the file, then just read that line only. Use the size read in and reserve for the vector, now continue reading and pushing back the rest of the file.


Good luck.
Alright, I've got the state of the file set back so that it can be read from again. For now I will stick with reading through the file to determine the size of the vector. I would like to get this working before I make any other major changes. Now I am having a problem assigning values to the vector. it is a two dimensional string vector. I am currently using the following line to try and assign values to the vector but it is not working for the following reason.

 
vect[m][n]push_back(line.substr(0,','));


This is returning an error

1
2
error: no matching function for call to ‘std::basic_string<char, std::char_traits<char>, std::allocator<char> >::push_back(std::basic_string<char, std::char_traits<char>, std::allocator<char> >)’
/usr/include/c++/4.2.1/bits/basic_string.h:869: note: candidates are: void std::basic_string<_CharT, _Traits, _Alloc>::push_back(_CharT) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>]
You don't need n, you're adding to the end of the vector.

vect[m].push_back(line.substr(0, ',')) ;


Last edited on
Thanks so much. I've got it working now, I appreciate the help.
Topic archived. No new replies allowed.