Using the fstream class in UNICODE?

Forum

Forum
Beginners
Using the fstream class in UNICODE?

Using the fstream class in UNICODE?

Pages: 12

Alright, I have posted on several of my regular programming forums and nobody seems to be able to answer this for me due to only being able to locate ANSI (single-byte) examples. I am using UNICODE wide-character (wchar_t, 2 bytes in Windows) C++ since ANSI is outdated and most of your modern programs are in UNICODE.

Anyway, I need to know two basic things. The first is whether or not there are any differences between fstream.get and fstream.read, and the second is if the UNICODE version of fstream.get reads in one byte at a time, or one wchar_t at a time? I ask the second question because the prototype is declared as "::get(wchar_t* _Str, int _Count)". Is "_Count" a count of wchar_t's, or a count of bytes? I need to be able to read in one byte at a time since I am working with binary files created years ago in DOS.

jsmith (5804)

_Count would be a count of the number of wchar_ts.

FF7 was a great game, if that is what you based your user name on.

Sephiroth (48)

So how can I read in a specific number of bytes at a time in a UNICODE project? For example, the file starts off with a four-byte header consisting of two 2byte shorts (DOS shorts). How could I get two bytes into a short, or four bytes into an int? This is my problem.

Yes, I have been using this username since 1996 and it was due to Sephiroth in the original release of FF7 on the PSX. I would have changed by now, but everybody knows me as Seph, so I am stuck.

helios (17574)

Depends on the endianness of the file.

http://en.wikipedia.org/wiki/Endianness

If the file is big endian, then (byte[0]<<8)|byte[1]; if the file is little endian, byte[0]|(byte[1]<<8). You should be able to extrapolate that to 32 bits.

Sephiroth (48)

So you're telling me that I have to somehow read in the binary data into an array of whcar_t's and then pull out what I need? If that's the case I am lost worse now than when I began. I have no clue how to do that since sometimes wchar_t is two bytes and sometimes it is four bytes. Not only that, but how do I read in data from a file that is an odd size. For example, how would I read in a file that is three bytes in size? Two wchar_t's would read past EOF, and a single four-byte whcar_t would fail as well.

This seems highly odd that C++ fails under common circumstances in UNICODE. There has GOT to be a way to read only one byte at a time without using ANSI. I fail to believe that C++ was purposely destroyed in this manner when using UNICODE.

helios (17574)

f that's the case I am lost worse now than when I began. I have no clue how to do that since sometimes wchar_t is two bytes and sometimes it is four bytes.

Then how did you come to the conclusion that you need to read to a wchar_t *? std::ifstream::read() will only take a pointer to an array of chars, anyway, so...

Could you please stop saying "UNICODE"?
UNICODE is a macro defined by the compiler to tell windows.h to use the wide-character versions of the WinAPI functions.
Unicode is a standard related to the Universal Character Set.

Last edited on

Sephiroth (48)

Both read and get require wchar_t when UNICODE is defined in the project. I already thought about trying read instead of get and discovered that they were identical. I can take a screenshot of the declaration popup of each if you'd like, but they're both identical on my end, which is yet another cause for confusion here.

And yes I understand that UNICODE is simply a definition used to make Windows use the wide-character functions and such, but I also see it used in Linux and X programming, so it isn't just for Windows unless the code I have viewed in Linux was written by people who simply declare it for no reason.

By the way helios, does "Elf" mean anything to you?

::read(wchar_t* _Str, std::streamsize _Count)

Last edited on

helios (17574)

Both read and get require wchar_t when UNICODE is defined in the project.

Where are you getting this? I've never seen anything like that, and I've written quite a few programs that defined UNICODE and read() took a char *. http://www.cplusplus.com/reference/iostream/istream/read/ does mention this, either. "UNICODE" doesn't appear once in The C++ Programming Language.

Other than the beings commonly depicted as having pointy ears and living in the woods, no.
ELF, however, does.

Disch (13742)

This seems highly odd that C++ fails under common circumstances in UNICODE

Join the club.

I'm terribly frustrated by the standard lib's obliviousness to Unicode, and its failure to support it.

What you may need to do is determine how you want the text in the file to be stored as (assuming it will always be stored in the same manner). This could be UTF8, UTF16LE, or UTF16BE (or I guess some manner of UTF32, but that'd be silly).

If UTF16, then you would read 2-byte pairs and do the appropriate bitshifting that helios illustrated in a previous post. UTF-8 is a bit more complicated.

I think there's ways to convert from UTF-8. Like with mbtowcs() or something. To be honest I'm not entirely sure how those functions work (or how well they work).

Sephiroth (48)

I am getting it from the definition in VS2005. Both "fstream::get" and "fstream::read" are identical in declaration, although "::get" has many overloads, where I only see one method for calling "::read". However, that all require wchar_t, and using "variable << fstream" also seems to read in two bytes. I've been stuck on this for a week now, but you'd know that if you checked the D2 forum more frequently! :p

helios (17574)

I'm using VC++ 2008 and I'm not getting anything like that. I can't even find any compiler option that will typedef std::ifstream to wide as default.

If you need to make sure that you only read narrow characters, you could do this:
typedef std::basic_ifstream<char> std::nifstream;

Sephiroth (48)

That presents me with compiler errors. Is there some better way to do this in C++? I thought fstream was there for working with files, but right now it seems inferior to using the old C version which made use of "fopen", "fread", and "fclose". I'm forcing myself to use Unicode in all of my new projects because ANSI is dead, and I want my projects to be compatible with more machines around the globe. This however, has been my first real hurdle.

Disch, I am reading binary files, not text files. I can easily read text files using fstream. My problem is that I am working on reading data from an older DOS game, which is binary data. The project is basicaly a new, OpenGL version of the game engine, but it isn't going anywhere due to not being able to read in the binary files!

helios (17574)

Try

1
2
3

namespace std{
typedef std::basic_ifstream<char> nifstream;
};

ANSI, which I'm guessing is another way of saying ISO-8859-1, which is another way of saying UCS-1, is not dead.

It's perfectly possible to support wide characters without using whatever silly #define you're using. The only points where wideness matters is during input and output. Input can only come from files, the console, or a GUI text box. Files should only be read into bytes, so conversion to wide characters should be done one level above. The console shouldn't be assumed to use non-ASCII, so no Unicode there, either. GUIs (e.g. WinAPI, Qt, wxWidgets) is the only point where some definition would matter. Some compilers don't #define UNICODE by default, so you accidentally end taking input into narrow characters.

I don't know why you're wasting your time defining stuff that only get in the way.

Sephiroth (48)

It's not me defining it, it's VS2005. I setup the project as a Unicode project. I haven't had problems until now. The problem is that the data I am trying to read was written by a 16bit DOS program, so I have to read a LOT of single bytes. That's where my problem is. I have yet to find a method to read one single byte from a file, which seems incredibly stupid, since I know I can do it if I am using plain old 8859-1, as you prefer to put it.

This situation to me is like a new car that can only do 55MPH or faster. It can't operate at 35MPH in a school zone for some reason. Older cars can, but this new one can't. Old C can read and write a single byte, and even 8859-1 C++ can, but the modern C++ using wide-characters seems to have lost that ability.

Oh and that code still produces errors.

Last edited on

helios (17574)

What the exact name of the compiler option you're using, including location in the project properties dialog?

Sephiroth (48)

Configuration properties, general, "Use Character Set". I have it set to Unicode. You can also set it to multi-byte, or none. None results in 8859-1.

helios (17574)

"None" is ISO-8859-1.
I've always used it there. Setting that option to "Unicode" defines UNICODE; setting it so "multi-byte" defines MBCS, IINM. The definition of neither changes the standard typedefs.
I can only guess there's something wrong with your installation, or that you're actually trying to use std::wifstream.

Sephiroth (48)

No, that is how Visual Studio works. Try it yourself. Once you select Unicode, it begins using the wide versions of everything. This includes Windows functions, where their names change over unless you explicitly call them. For example, "MessageBox()" is defined as "MessageBoxA" with nothing selected, but it is defined as "MessageBoxW" with Unicode selected. Nothing is wrong with this installation, it has always been this way.

helios (17574)

I don't need to test it.

I've always used it there. [...] The definition of neither changes the standard typedefs.

std::ifstream::read() takes a char * as its first parameter, and std::wifstream::read() takes a wchar_t * as its first parameter. If this changed in any way based on compiler definitions, then the compiler would not be compliant.

The declarations of WinAPI functions is not standard, and thus changes according so some definitions.

This should compile on any compiler without errors or warnings:

#ifndef UNICODE
#define UNICODE
#endif

#include <iostream>
#include <fstream>

int main(){
	std::ifstream file;
	char a[10];
	file.read(a,10);
	return 0;
}

Last edited on

Sephiroth (48)

Could it be because I am not including "iostream"? I only include "fstream" since that includes "iostream" itself. I do all of my code that way. For example, if I have class "A", and class "B" includes "A", I only include "B" in my main file and gain access to both classes.

As for not being compliant, we ARE talking about MSVC, and even though VS2005 made huge leaps towards being compliant, I am sure it isn't perfect! However, to prove my point, I ook a screenshot. Check the link below and tell me what you think.

http://www.geocities.com/shrew_ii/fstream.jpg

Pages: 12