To unicode or not to unicode...

Alright, so the title of the thread may have been - slightly - grammatically incorrect, apolo.. I mean, stick it english nazi's!

I've just started delving into the Win32 API with the help of "Programming Windows" (Petzold), and what I was wondering is this, how often is unicode implemented in practice?

I understand how to program with it (I believe), but I was curious as to how many people actually do:

i.e using TCHAR and using functions that are implemented with either ASCII or Unicode based on whether UNICODE is defined...

Any views on the topic would be appreciated, however please refrain from "i dont use it its shit" responses :)
Use Unicode always. That's my opinion. Other Microsoft technologies like COM use Unicode only, so it will make your life easier, plus it is not inconceivable that Microsoft will stop deploying ANSI API's. I think there's already one or two that are Unicode-only (don't remember which ones, though).

So simply put: Always Unicode.
I'm of the same opinion as webJose.

I usually don't even use the TCHAR nonsense because TCHARs just add an additional level of complexity. I stick to wchar_t and the 'W' versions of functions/structs/etc.

However... I seldom work with WinAPI directly. When I do, I usually wrap it around another layer so I can use utf-8 internally in my program. Then I just convert to utf-16 when speaking with WinAPI.

Alternatively you could change the code page in WinAPI and give it UTF-8 directly with the 'A' series of functions. Or at least theoretically. Personally I've never actually tried it.
Its a pain, Get Funky.

I do all three, i.e., use chars, w_charts, and TCHARs. The whole TCHAR thing is incredibly ugly, of course. After fighting it for many, many years, I finally gave in and made the momentous decision to absolutely give up chars and wchar_t and stick to TCHARs. Unfortunately, the will is only so strong, and I have periodic relapses.

It seems to be the price of doing business in C++ for Windows though.
I absolutely HATE programming in Unicode, but since MicroSoft and everybody else is going that direction, and since it's clearly the wave of the future, as was said above, program in Unicode.

At least until they come up with something better, if they ever do.
Thanks for the replies, I'd sooner ask a lot of questions than be ignorant :)

@Disch:
- When you mentioned UTF-8 I had a bit of a look into it (heard of it but never really payed much attention). Wouldn't it have the same issue where random read is difficult as charcaters may consist of multiple bytes (i.e. one character is one byte and the next is two bytes)?

- Also, how does UTF-8/16/32 pertain to using wchar_t and such, if they support variable width encoding, for example UTF-32, where each character is four bytes? (As now we are using integers unlike a short int like with wchar_t)?

@Lamblion:
- Out of curiosity, why do you hate using unicode?

Lamblion wrote:
I absolutely HATE programming in Unicode,


Why? It's like a million times simpler than the alternatives.

One value = one codepoint. Every glyph can be represented at any time.

You don't have to worry about code pages or any other weird crap.

get funky wrote:
Wouldn't it have the same issue where random read is difficult as charcaters may consist of multiple bytes (i.e. one character is one byte and the next is two bytes)?


That kind of thing is only a concern if you're doing text processing. For most purposes you don't need to distinguish between individual codepoints. 99% of the string work I do is just passing them around (they are input from either a file or from the user, and I don't have to modify them much outside of that).

And if you do need to really get down and dirty with them, you would probably want to convert them to UTF-32 somehow so they can be easily manipulated.

get funky wrote:
Also, how does UTF-8/16/32 pertain to using wchar_t and such, if they support variable width encoding, for example UTF-32, where each character is four bytes?


That's kind of the thing with wchar_t. It just means "wide character" which in the context of Unicode means absolutely nothing.

On Windows, WinAPI uses wchar_t as a 16-bit variable to have UTF-16 data. On other platforms, though, wchar_t may be a different size (on *nix it is usually 32 bits wide). Which is part of the reason why I don't really like to use it unless I'm interfacing with WinAPI.
Topic archived. No new replies allowed.