seek in a large file stream

I need to use seekg in fstream class of a very large file, but in Visual C++ STL library, the type pos is a 32bit int up to 2GB, so when I seek to the position larger than 2GB, it get the wrong position, while in Linux the type pos is a 64bit int and have no problem. I know that the typedef of pos is defined in a traits class, can anyone tell me how to write a traits class and use it to make my fstream support files larger than 2GB?
I'm afraid you've entered the murky world of memory models.

The general idea is that for each memory model, there's a fixed range and size of element. Within each memory model you can define a size, an iterator and so on. Once this is abstracted away neatly, you can use multple memory models within an application. This was the original intent of std::allocator. There wasn't enough time to do it properly, and it became sort of memory pool thing.

A practical example of memory models in action is in programing MS-DOS. There were different memory models that mapped onto far/near data and code segments.

I'm afraid you need to use something else to manage references in your large file. You've exceeded the limit of your implementation. Can I asume that your Windows development is 32bit and the Linux development 64bit? Again, different memory models. This would explain the difference in the range of std::streampos.
Last edited on
I don't think I see the relation between "what assumptions the compiler is allowed to make when generating code for segmented memory or paged memory platforms" and the size of a member in a class. Specially since both Windows and Linux used paged memory. My question now is "WTF are you talking about?"

As far as I know, it's impossible to do what you ask.
You could wrap system calls around a function of your own and decide which version to call based on what compiler macros are defined.
I think Boost has a class called boost::filesystem::fstream that uses 64-bit integers for its file pointers and wraps the system calls. You should probably check it out.
Last edited on
Are you working on 32bit Windows and 64bit Linux?
I'm working with a 32bit Visual Studio in 64bit WinXP, and the linux system is actually a 32bit cygwin in the same 64bit WinXP.
I find that the argument of seekg() is ifstream::pos_type, and ifstream is a typedef of basic_ifstream<_Elem=char, _Traits=char_traits<char> >, pos_type is a typedef of _Traits::pos_type. So I think we can use basic_ifstream<_Elem=char, _Traits=MyTraits> to make the stream support large file. But I dont know how to write the class MyTraits.
I don't think this is associated with whether the system is 32bit or 64bit, because I can use most of the 32bit programes to deal with large files.
I don't think this is associated with whether the system is 32bit or 64bit, because I can use most of the 32bit programes to deal with large files.

Nope. You're wrong. The limit for accessible data using only standard functions is LONG_MAX. This value is of course compiler defined, which is why you have no problem using the standard functions when in 64-bit mode.
This limit only applies to the standard functions. System calls don't necessarily have this restriction (e.g. WinAPI's SetFilePointerEx(), which takes a 64-bit union, or UNIX's _llseek()).
In fact I can access all the data in the large file. I can read the file byte by byte from the beginning to end without using seekg and tellg. I only can not get the right position when I use seekg in a basic_ifstream<_Elem=char, _Traits=char_traits<char> > class, because the type of ifstream::pos_type is implement by int in MSVC. I need a new traits to change the pos_type become __int64 type.
I googled for strings related to your question before my previous post and this is the closest thing to a solution I could find:

If there's any way to do what you want to do, it would have probably appeared near the top of the results list, but it didn't. Therefore, I can say with some degree of certainty that it is not possible.

Like you said, it's possible to get() a file byte by byte. This, however is very slow. The above solution doesn't work with the faster read() method.
I had this problem when working on an antivirus solution. The bottleneck is reading from the hard drive. If you read many small chuncks, it will take much longer than reading large chuncks at a time.

Rather read 1 x 100MB than 10 x 10MB. It's quicker that way as windows will tell you:
You're working beyond the bounds of the standard library's default memory model. That's why your address range is limited as seen from the STL stream library.

If you don't want to get involved such abstractions, just call the OS directly. WIN32 supports files larger than 4G and uses 64bit file positions. For example, to change the file pointer, SetFilePointerEx. The pos is a LARGE_INTEGER, which is a longlong.
Topic archived. No new replies allowed.