Large Data File Access

Oct 1, 2009 at 9:48pm

Suppose I have a very large data file that I do not want to read into memory for efficiency purposes. Can I tell ofstream to jump to a certain line of the file, or must I read the file sequentially? What is the most efficient way to handle large files like this?

Oct 1, 2009 at 10:05pm

helios (17607)

You can jump to a byte offset within the file because bytes are of fixed size. You can't jump to a specific line because lines are of variable size, so you can't tell how long it is without reading it whole.

When files exceed a certain size, they should be in smarter formats that store more information about the file. For example, a really long text file could be stored with a table containing the offsets where each line starts, or where one every ten lines starts.

Oct 1, 2009 at 10:19pm

joshky (87)

Tell me if i understand this properly: When writing the file, every so many lines I should see where the write buffer is, and write that variable into a vector or something, then write the vector into the file at the end? Would it be even more efficient to put the table in a separate file?

Oct 1, 2009 at 10:27pm

helios (17607)

write the vector into the file at the end?

If you write it at the end, how will you know where it starts?

Would it be even more efficient to put the table in a separate file?

Yes.

It's probably a good idea to open the file as binary.

Oct 1, 2009 at 10:32pm

joshky (87)

If you write it at the end, how will you know where it starts?

Duh, I guess that one was kind of obvious.
Thanks for the help Helios now I have a solid strategy to play with. That's all I'm really doing anyways is playing with files trying to learn something today. Thanks again.

Topic archived. No new replies allowed.

C++

Forum

Large Data File Access