Ignoring Last K-Lines of A File

Hi all,

I have a file that I want to process except
the last K lines. The data looks like this:

1
2
3
4
5
foo
bar
qux
hum
nid

Surely the actual dataset has lines ~10^7.

And I want to process all, except the last K=3.
What's the the way to do it?
I am stuck with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        while (! myfile.eof() )   
        {
                stringstream ss(line);    
                string Tag;

                ss >> Tag;
                cout << Tag << endl;   
              
              // if (???) break;
              // I am stuck here.
              // we also don't know in advance how many lines
              // does the data contain
        }
        myfile.close();
    }

    else cout << "Unable to open file"; 

    
    return 0;
}
Last edited on
You need to read all the lines into a vector, then just erase the last k elements.
I'm guessing you don't have enough memory to store the entire file at once if it has 10^7 lines.

In that case, I'd suggest the following algorithm:

create an empty deque
while ( not at eof ) {
read one line from file
put the line at the end of the deque
if( deque has k + 1 lines in it ) {
pop the first element off the deque
process it
}
}

when the algorithm terminates, the deque will hold the last k lines of the file, unprocessed.
Hi jsmith,

You are right, memory is the problem. But with your algorithm we still need to hold 10^7 -3 lines
in the deques right?

Is it possible avoid storing them? We just want to process line by line.
Sorry, I didn't see the amount of lines.

No, jsmith method only keeps up to 4 lines at once in memory.
Topic archived. No new replies allowed.