Yeah so I thought I wanted to learn Visual Studio 2017
I've often heard people complain about managing strings in C being too much to deal with. It was difficult but not impossible.
Microsoft (maybe to push their Visual Basic agenda?) has actually gone and made it that bad. These "wide strings" are nothing but a useless pain in the ass.
I mean, I guess having 16 bit characters is really important if you're writing in some obscure language in a shithole corner of the earth, but for the majority of the civilized world, it's just another headache.
In this case, I think maybe it's one too many. This isn't worth it.
Microsoft (maybe to push their Visual Basic agenda?) has actually gone and made it that bad. These "wide strings" are nothing but a useless pain in the ass.
Every version of Windows since Win 2000 uses the 16 bit character set internally. So, if you are determined to use 8 bit characters, realize that every operating system call is going to have to pass through a translation layer to convert to 16 bit. For example, if you call CreateWindowA(...), which is of course supported, internally, your string parameters will be converted to wide characters (you won't see it, but it'll happen, and it'll take time), and CreateWindowW(...) will be called. I'm just telling you how it is.
I mean, it's only China, Japan, and Korea that reserve most of the Basic Multilingual Plane. What are the chances that someone will write in one of the languages of 22% of the world population?
The wchar_t on that system is 32 bits long, so you don't have to worry about the difference between code points and code units. The problem is that it wastes space, and the internet is primarily UTF-8, so conversions become necessary.
But I never can find a straight answer. Does MS only support the BMP? Or does Windows actually support Unicode in the form of UTF-16? (MS used to implement UCS-2)...?
mbozzi: Yes, on Linux wchar_t is quite wide, but all system APIs use UTF-8, so if your application doesn't specifically care about the content of strings, it can just use char everywhere and it'll (usually) work fine, and if it only produces ASCII it can talk directly to the system without doing useless conversions.
Unicode strings in Windows are UTF-16. The documentation unfortunately doesn't differentiate between characters and code units.
Applications are of course free to interpret strings as UCS-2.