Hello,
I am trying to read in a large text file. I have it such that I know exactly the cursor position I need to get to and start reading. The text file is ~3GB large. When I use seekg() it works correctly up to a point, then it no longer works. I think what is happening is that as it scans through each character, it is saving it in the memory. So when I try to access the data at the end of the file, I run out of memory and it doesn't work. Is there a way that I can have the same functionality as seekg, but it just dumps everything I skip or it saves it to a small buffer that gets cleared so that it does not use up much memory? Thanks
I don't think that seekg() works the way you are thinking. I suspect it is something else that is going wrong. Perhaps if you post some code then people can try to figure out what might be the trouble?
So I have a large data file, Ill call it my base file. I then take the base file and pad it to the right with blank spaces such that every line has the same number of columns. That way, I can jump to the beggining of any line with seekg(). In total, the file has 2,979,199 lines. I want to access the last data set which starts on line 2,979,100. If I pad the file such that each line has 650 columns (651 including \n), I can use seekg and it (pretty quickly) can grab the data on line 2,979,100. If I pad the file such that there are 801 columns in each line, seekg cannot grab the data at 2,979,100. It can only go up to ~2,600,000 (i forget exactly where it stops) before it starts returning nothing. The file size padded to 650 columns is ~ 1.8GB, the file size padded to 801 (802 with \n) columns is ~2.2 GB. I have 3GB memory on a laptop, with 256MB shared for video (I think). When I try to open the 801 padded file in a text editor, it gets really close to completely loading then it crashes. Im pretty sure it is because it is running out of memory. My c++ code is given below incase you notice something. Thanks
Is there any reason you are using double for the line number? Normally you would use an unsigned int but more properly a type std::streamoff, a bit like this:
hey,
I was actually in the middle of asking about using the
double line = 2979100;
rather than an int when my internet crashed (twice). When I used an int it would compile but give me a warning that the value in the argument of seekg was out of range. But when I did a double and cast it as an int it didn't give me that warning, so I guess I just assumed it worked. I think that that may have been the issue, I was trying to access a line outside of the range of whatever they used (short int?). Your suggestion of going with a std::streamoff worked perfectly! Thanks alot Galik, you've helped me a ton the past couple days!