Problem writing 0x0A

I've been messing around with writing into a Unicode (UTF-8) txt file. So far, I've been manually inputting all of the hexadecimals because I'm only using two characters (the full block █, which is 0x88 0x25, and a regular space which is 0x20 0x00).
However, I'm at the end of a line and want to continue on the next, so I input 0x0D, 0x00, 0x0A and 0x00 into the file. But whenever I do that and look at the file in a hex editor, i see 0D 00 0D 0A 00. I believe this is the compiler's fault because it's trying to write both 0D and 0A in one shot because it thinks I'm trying to insert a new line (like \n) in ANSI format.

Here's the problematic part:
1
2
3
4
txt.put (0x0D);
txt.put (0x00);
txt.put (0x0A);
txt.put (0x00);

I've also tried using txt.write, and txt<<(char)0x0A; but neither work.

I was poking around and thought maybe if I wrote
1
2
3
txt.put (0x0D);
txt.put (0x0A);
txt.put (0x00);

and then went back and replaced the second 0D (which is automatically placed there) with 00 it would work, but I haven't found a way to do that yet.

So if someone would like to help, I want to know either how to directly get 0A into the file without the extra 0D being tacked on
OR
How to go back and change the 0D to an 00 and then move on to input more stuff.

Thanks!
- Check to make sure you opened the file as binary. I think iostream defaults to text mode for whatever dumb reason, so be sure to open it with ios::binary in the flags. Text mode will automatically convert \n (and maybe \r?) to \r\n on windows, which might be the source of this trouble. It might also stop you from writing null characters, but I doubt it.

- spaces and new lines shouldn't have a null character after them if you're using UTF-8. In fact 0 probably shouldn't appear anywhere in the file unless you're using it as a special marker for something

- You shouldn't have to write utf-8 characters one at a time like this. Provided you save your source file as UTF-8, you should be able to use string literals:

 
txt << "Some unicode chars:  ほぜしまそいあ  \nThis is on a new line!";


Provided you're outputting binary mode and your .cpp file is saved as UTF-8, I see no reason why this wouldn't work unless the compiler mangles the strings (which it shouldn't). Give it a try, it should make your life infinitely easier.
Okay, I'll try opening in binary mode. I kept away from opening in binary mode in the first place because it was a bit overwhelming at first, but I'll go back and try it out again. Also, if I save my .cpp file as UTF-8, my compiler doesn't compile it (it was one of the first things I tried).

Thanks a lot, I'll report back maybe tomorrow with my results!
What compiler are you using? Both VC++ and GCC can compile UTF-8 files just fine. I'm a bit shocked that there's a compiler out there that would have a problem with it.

EDIT:

Also if you can't get UTF-8 cpp files working no matter what you do -- you can still use string literals, it's just not as nice:

1
2
3
4
5
6
7
8
9
10
11
12
// full block is 0x88 0x25, so you can use escape characters \x88\x25
//  but because the escape sequence carries into another character, you may need to put it in its own quotes:

txt << "This is a full block: " "\x88\x25" ", isn't that neat?";

// or if you want to make a constant variable for it...
static const char* fullblock = "\x88\x25";

txt << "Another way to print a full block: " << fullblock << ", weee";

// also....
txt << "\r\n";  // <-- 0D, 0A, but easier and more readable 



EDIT again:


Wait a minute.... 0x88 0x25 is not valid UTF-8. The code is U+2588, but in UTF-8 it's expressed as E2 96 88.

I think you were outputting UTF-16LE before, not UTF-8 (that explains all the zeros!). Anyway if you want to stick with UTF-8, everything I said still applies, but fullblock would be "\xE2\x96\x88" instead.
Last edited on
Ah, yes, now I see what the issue is. Well, since I have everything written out for UTF-16LE, I'll keep it that way. Also, the method I used before for works perfectly in binary mode, without any modificiations.

The "root" of this problem comes from how I extracted the hex values for the characters. I literally pasted the characters I wanted into notepad and saved as "Unicode" (which apparently is UTF-16LE), and then read the file in a hex editor. As a thank you, here's (part of) a barcode generated by my program!
1
2
3
4
█ █   ██ █  █  ██ █   ██   ██ █  █  ██  ██  █ █ █ ███ █  ███ █  ██  ██ ██  ██ ██ ██  █    █ █ █
█ █   ██ █  █  ██ █   ██   ██ █  █  ██  ██  █ █ █ ███ █  ███ █  ██  ██ ██  ██ ██ ██  █    █ █ █
█ █   ██ █  █  ██ █   ██   ██ █  █  ██  ██  █ █ █ ███ █  ███ █  ██  ██ ██  ██ ██ ██  █    █ █ █
█ █   ██ █  █  ██ █   ██   ██ █  █  ██  ██  █ █ █ ███ █  ███ █  ██  ██ ██  ██ ██ ██  █    █ █ █

If you were to scan that at Wal-Mart, a 25 Tissue pack of Magnivision Lens Cleaning Tissues would pop up as the item.
I also got it to output into a bitmap file :)
Topic archived. No new replies allowed.