Question on reading binary data

Hi all,

I want to create a class to read and write a binary file format. I'll be using ifstream's read function to get data out of the file, and I was wondering about multibyte chars. If chars are multibyte then read may read more than 1 byte per character, right? So is there a way to guarantee the size of the read in bytes and not in chars?
Last edited on
Playing with multibyte streams is not really safe... the STL doesn't really give much direction to it, and I still haven't really figured out the locale conversion system (which should make it work OK, but still...)

The problem I think is that most people can't figure the thing out, and so it just doesn't work properly.


Using the usual fstream classes (parameterized over a char == 1 byte), then read() will always work on bytes (aka chars). Once you read the characters in, then you can decode the multibyte character.

There are a variety of libraries to do this, but I recommend the iconv library to get started.
http://en.wikipedia.org/wiki/Iconv
It is pretty simple to use.

Good luck!
(parameterized over a char == 1 byte)

This is what I was confused about ... I was under the impression that char's are a minimum of 1 byte, and that they may be as large as MB_CUR_MAX. Is that wrong?
they may be as large as MB_CUR_MAX. Is that wrong?
Yes. That's something completely different that has nothing to do with the size of char.
A multibyte character could be the UTF-8 sequence "é", which represents the Unicode character 'é'.
Ok, was the part about char's possibly being larger than 1 byte wrong? If so, is there a way to read in a specific number of bytes and not chars? I realize that "char == 1 byte" is probably a very safe assumption, but it's still an assumption, right?
TC++PL says that chars are almost universally 8 bits long. From what I've read, the language definition treats "character" as a synonym of "byte". The above sentence can be rewritten as "bytes are almost universally 8 bits long", which is also true.
To answer your original question (and repeat what Duoas has already said), read() will always read bytes no matter what. Whether these bytes are octets is a different story, but unless the system is ancient or arcane (e.g. an embedded CPU), they are.
Last edited on
Topic archived. No new replies allowed.