Ah-ha! I knew it couldn't be the compiler!
Well, here's what I think:
std::basic_ifstream<T>::read() will read one byte at a time from the file and fill each of the Ts in the passed with that byte. This is the only behavior that makes sense to me, since the size T is guaranteed to be >=8 bits. Since the size is not guaranteed to be greater, the template can't assume that the instance will be able to write after bit 7, so bits 8 and above will always be empty.
This mean that
can safely pass an array of wchar_t, but given that sizeof(wchar_t) is very likely to be at least sizeof(char)*2, this would be a waste.
As for the encoding, you don't need to worry about it. Encodings have no meaning when dealing with binary files. A binary files is merely a stream of bytes, which once you've loaded, you can interpret any way want.
For example, the stream 41 00 could be interpreted as:
1. A C string in ISO-8859-1.
2. A C string in UTF-8.
3. The character U+4100 in UCS-2, big endian.
4. The character U+0041 in UCS-2, little endian.
5. None of the above.
So, as you can see, it doesn't really matter how you read files. You, the programmer, are the one assigning meaning to the stream.
This is the way it should be. Otherwise, you'd be limited by how the language or the system choose to interpret the stream. This is why I never open files as text, anymore; only as binary.
If you want a more thorough example of how to get wide characters from a stream of narrow characters, take a look at this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
//This is a heavily nerfed version of a function in one of my projects. I think
//conveys the message, though.
std::wstring getScript(const std::string &scriptName,ENCODING_T encoding){
//Automatically opens the file as binary, fills a buffer, closes the file,
//then returns the buffer as an std::string.
std::string buffer=readfile(scriptName);
switch (encoding){
//Each of these functions takes a const std::string & and performs the
//corresponding conversion to native std::wstring:
case ISO_8859_1_ENCODING:
return UniFromISO88591(buffer);
case UCS2_ENCODING:
return UniFromUCS2(buffer);
case UTF8_ENCODING:
return UniFromUTF8(buffer);
case SJIS_ENCODING:
return UniFromSJIS(buffer);
default:
return L"";
}
}
|
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format