Using the fstream class in UNICODE?

Pages: 12

What headers are included or in what order should make no difference.

Understand that there's a huge difference between adding a few extensions and completely changing the way standard headers work.
If VC++ 2005 is really doing that (which I doubt it. The post-millennium versions of VC++ are great. There must be some some other unknown factor at work), then I suggest you immediately upgrade to the newest version. The behavior is simply unacceptable. You might first want to see if that's actually the option that's causing it by starting a new project with default settings and only changing that setting in particular.

I would also like to see the project, if it's possible.

Sephiroth (48)

I can't go into detail on the project on a public forum. I will PM you on the D2 forums with the project and sources.

As far as the options go, if I change back to "Not Set", everything changes from "wchar_t" to "char". I already thought of this and had tried it prior to posting here. In other words, setting the project to ISO-8859-1 makes fstream work exactly like it is depicted on this very site. The problem with that however, is that it isn't Unicode and won't work with my private OpenGL renderer or OpenAL sound engine, both of which are Unicode.

helios (17574)

I'm not registered on any such forum. "Helios" is, unfortunately for me, a pretty popular pseudonym.

Are you saying that you can't link to those libraries if UNICODE is not defined, or that you need to use wchar_t?

Last edited on

Sephiroth (48)

I just figured out the problem. It wasn't VS2005, but a typo in a core class several levels above the class that I am working on. I just happened to typo where I declared the file stream, and the typo wasn't caught by Visual Studio because the typo was "wfstream" instead of "fstream". As soon as I saw that, I knew what had happened.

Still, I am using wide characters everywhere in my program, so why shouldn't I use wfstream? Is it just incapable of reading one single byte at a time since wchar_t is two bytes or more on all platforms? If so, I can use fstream, but I feel like I am mixing old 8859 crap with the newer Unicode stuff. How badly will that impact performance since it will be converting to and from old ASCII and wide characters?

You said "ELF" meant something to you. Could you elaborate here? I thought you were somebody I knew.

Last edited on

Disch (13742)

He probably meant this:

http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

helios (17574)

Ah-ha! I knew it couldn't be the compiler!

Well, here's what I think:
std::basic_ifstream<T>::read() will read one byte at a time from the file and fill each of the Ts in the passed with that byte. This is the only behavior that makes sense to me, since the size T is guaranteed to be >=8 bits. Since the size is not guaranteed to be greater, the template can't assume that the instance will be able to write after bit 7, so bits 8 and above will always be empty.
This mean that can safely pass an array of wchar_t, but given that sizeof(wchar_t) is very likely to be at least sizeof(char)*2, this would be a waste.

As for the encoding, you don't need to worry about it. Encodings have no meaning when dealing with binary files. A binary files is merely a stream of bytes, which once you've loaded, you can interpret any way want.
For example, the stream 41 00 could be interpreted as:
1. A C string in ISO-8859-1.
2. A C string in UTF-8.
3. The character U+4100 in UCS-2, big endian.
4. The character U+0041 in UCS-2, little endian.
5. None of the above.
So, as you can see, it doesn't really matter how you read files. You, the programmer, are the one assigning meaning to the stream.
This is the way it should be. Otherwise, you'd be limited by how the language or the system choose to interpret the stream. This is why I never open files as text, anymore; only as binary.

If you want a more thorough example of how to get wide characters from a stream of narrow characters, take a look at this:

//This is a heavily nerfed version of a function in one of my projects. I think
//conveys the message, though.
std::wstring getScript(const std::string &scriptName,ENCODING_T encoding){
    //Automatically opens the file as binary, fills a buffer, closes the file,
    //then returns the buffer as an std::string.
    std::string buffer=readfile(scriptName);
    switch (encoding){
        //Each of these functions takes a const std::string & and performs the
        //corresponding conversion to native std::wstring:
        case ISO_8859_1_ENCODING:
            return UniFromISO88591(buffer);
        case UCS2_ENCODING:
            return UniFromUCS2(buffer);
        case UTF8_ENCODING:
            return UniFromUTF8(buffer);
        case SJIS_ENCODING:
            return UniFromSJIS(buffer);
        default:
            return L"";
    }
}

http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

Last edited on

hypercube1 (66)

This seems highly odd that C++ fails under common circumstances in UNICODE

talk to bjarne

Last edited on

Topic archived. No new replies allowed.

Pages: 12

C++

Forum

Using the fstream class in UNICODE?