Unicode Headache

This Windows Unicode business is about to drive me to the point of throwing this computer through the window! That said, I am trying to run a program from an old, pre-Unicode tutorial. But, I can't seem to get a straight story on how to convert. Do I use "TCHAR" or "wchar_t"? And, what about this bit of code?:

1
2
3
     indexint = GetSystemMetrics(sysmetrics[i].iIndex);
     bfrlen = swprintf(szBuffer, tcslen(szBuffer), "%5d", indexint);
     TextOut(hdc, Xpos, Ypos, szBuffer, bfrlen);


The problem is the middle line, the one that starts with "bfrlen=...". I have tried every variation of "printf" to print to a buffer. Every one gives me an error. Same with trying to get the character count for "szBuffer", to be used in TextOut. Should I even be using "TextOut"? "szBuffer[]" is a sting of type TCHAR. Or should I be using "wchar_t"?
"An error" can mean many things.
Post the actual error message.
Also, exactly what type is szBuffer?

If you want to use unicode then make sure UNICODE is defined (before any includes) and use wchar_t instead of TCHAR.
And use wcslen instead of _tcslen.
And any string literals should have L before them: L"%5d".
can't seem to get a straight story
That's partly because Windows used to support the obsolete UCS2 and called it "Unicode".
Now it actually supports Unicode, but unfortunately they did so by transitioning to UTF-16, which has a number of significant disadvantages relative to UTF-8 and UTF-32.

You'll want to read & write another Unicode format internally, translating to UTF-16 around the boundaries of API calls that aren't encoding-agnostic.
Last edited on
Thanks Dutch. Those tips helped to clear the errors. I also found that "lstrlen" will work. "szBuffer[]" is a text string, of the type "wchar_t"

Though it runs, I found a new problem. Of the list of constants, the routine only prints the first entry. All of the other values are there, They're just not printing. And I can't seem to find why. Here is entire loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
    int Xpos1 = cxChar,
        Xpos2 = cxChar+(22*cxCaps),
        Xpos3 = cxChar+(22*cxCaps)+(40 * cxChar),
        Ypos, indexint, bfrlen;

for (i = 0; i < NUMLINES; i++)
        {
            Ypos = cyChar*(1+i);
            TextOut(hdc, Xpos1, Ypos, sysmetrics[i].szLabel, lstrlen(sysmetrics[i].szLabel));
            TextOut(hdc, Xpos2, Ypos, sysmetrics[i].szDesc, lstrlen(sysmetrics[i].szDesc));

            SetTextAlign(hdc, TA_RIGHT | TA_TOP);
            indexint = GetSystemMetrics(sysmetrics[i].iIndex);
            bfrlen = swprintf(szBuffer, lstrlen(szBuffer), L"%5d", indexint);
            TextOut(hdc, Xpos3, Ypos, szBuffer, bfrlen);
            SetTextAlign(hdc, TA_LEFT | TA_TOP);
        }


The first two "TextOut" calls print as they should. But, for some reason, only the first line displays the third call. "Sysmetrics" is an array of structures. "szLabel" is the macro defined by Windows. "szDesc" is a short text string, describing each entry. Unfortunately, I am unable to copy and paste the output.
This looks like a stitch up of code from Petzold's classic "Programming Windows, 5th Edition" - chapter 4. "An Exercise in Text Output."

One of the 3 SysMets programs.

https://github.com/recombinant/petzold-pw5e/tree/master/Chapter%2004%20An%20Exercise%20in%20Text%20Output

iIndex, szLabel and szDesc are struct data members defined in a custom header file, NOT defined by Windows.

Where the hell is this tutorial?
Yes, the code is out of Petzold's 5th Edition. His original code produced more errors than I could possibly list here---all related to Unicode. I spent hours tweaking the code, just to get it to compile. It now runs, but it doesn't run right. As I mentioned above, I cannot get the third entry to print more than the first line.
Someone has done a lot of the conversion already to modern practices in the Win32 API.
https://github.com/recombinant/petzold-pw5e

The errors were NOT related to Unicode, but from changes to the underlying Win32 API. The idea Unicode was the culprit is a red herring.
OK, I found the problem. I was using "swprintf" instead of "wsprintf". This was the result of bad info. While trying to track down an error, I read that "wsprintf" was supposedly no longer supported. I was directed to use "swprintf" instead.

This is just one example of the conflicting info out there--especially when it comes to handling character strings in Windows. Another is the question of using string macros like "TEXT" and "TCHAR". I have been told that it is better to use "L" and "wchar_t" respectively. Yet, I'm sure that someone else will now come along, and say the opposite. Is there a right answer?
TEXT and TCHAR are macros that translate to char or wide-char depending on whether UNICODE (and/or _UNICODE) is defined or not.

swprintf has a secure version that you should use instead, swprintf_s.
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-s-sprintf-s-l-swprintf-s-swprintf-s-l?view=vs-2019

wsprintf with the latest Win10 SDK is still alive and kicking.

For that matter there is <strsafe.h> that provides better string handling.
https://docs.microsoft.com/en-us/windows/win32/menurc/strsafe-ovw

If go totally Unicode only with Windows apps there are other modifications you should do.

1. WinMain should be wWinMain.

2. There are quite a number of Win32 functions that come in ANSI and wide char versions, with a macro choosing which version to use.

TextOut, for example. ANSI - TextOutA, Unicode - TextOutW.
https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-textouta
Thanks Furry Guy. I know that both "TEXT" and "TCHAR" are macros for toggling the use of Unicode. I just wonder why so many folks say that I should never use them. Also, do all Windows string functions come with both an 'A' and a 'W' version? I'm no real fan of Unicode. I just seem forced to deal with it, as every example and tutorial that I come across, is written for the wide character set.
The only SURE way to know if there are separate ANSI/Unicode versions of Win32 functions is delve into the SDK on MSDN.

If you are using VS 2017 or 2019 you can hover your pointer over a function and a popup should tell you if the A or W version is being used.

so many folks say that I should never use them

Because they're lazy? They're overly-pedantic pricks? I don't agree with that "should not use" crap.

Trying to "unicodify" code that works with the ANSI/Unicode macros is IMO idiocy. It works, leave it the fuck alone! Fix what is broken, don't break it just to be "modern."

Fixing code for the changes that happened because of the changes to the Win32 API, especially for 64-bit, then do it!

If writing new Win32 API/Desktop code then why not emphasize Unicode only code. No more need for ANSI since XP became the "default" Win OS on new PCs.

Petzold's code for the 5th edition was written when Windows was still a Frankenstein's monster of mixed 16- and 32-bit API code. Too back his 6th edition went with C# and <i>verdammt</i> managed code. *SPIT*

For intellectual curiosity I occasionally think about taking older code I have written that was ANSI/Unicode swappable and "upgrading" to Unicode only.
Thanks Furry Guy. I know that both "TEXT" and "TCHAR" are macros for toggling the use of Unicode. I just wonder why so many folks say that I should never use them.
It's 2020, well past the point where where it's acceptable to ignore basic support for multilingual and international users.

Writing code that's supposed to be portable between text encodings implies a burden on Q/A infrastructure, the build system, package maintainers, and on developers.

It doesn't make any sense to maintain the TCHAR/_T machinery if you're never going to utilize it anyway.
Last edited on
It's worth noting (as it hasn't been mentioned) that Windows natively uses 16bit characters and the W API calls.

The A calls and the TCHAR/_T system were to help port Win16 code to Win32 (all those decades ago), and when you use them, you're incurring a translation cost to and from the native wide char interface.
To learn from petzhold you first need convert all the sample API's into today's equivalents.
That basically means more than 50% of the code.

ie. all GDI API's should be converted into DirectX, also there are functions which work only with 32bit compilers etc..

And of course there is very popular topic shared on internet about unicode:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
To learn from petzhold you first need convert all the sample API's into today's equivalents.
That basically means more than 50% of the code.

Someone has done most of the conversion work already:
https://github.com/recombinant/petzold-pw5e

I wish they hadn't stopped before finishing the task, though comparing the original code with the updated code makes for a tutorial for converting the unfinished code for yourself.
Topic archived. No new replies allowed.