That means you are not using ISO 8859-2. There is no reason to bring it up.
To make things clear, the character sequence "Č莞" is encoded as follows:
Unicode: U+010C U+010D U+017D U+017E
UTF-8: 0xC4 0x8C 0xC4 0x8D 0xC5 0xBD 0xC5 0xBE
ISO 8859-2 0xC8 0xE8 0xAE 0xBE
Windows-1257: 0xC8 0xE8 0xDE 0xFE
IBM CP-775: 0x86 0xD1 0xCF 0xD8
etc. there was a multitude of code pages before UTF-8. Unless you have a file that holds 0xC8 0xE8 0xAE 0xBE as opposed to any of the other possible representations of this string, ISO 8859-2 is as irrelevant as CP755
I need to include wcslen(wtext) to get number of characters |
wcslen(L"Č莞") is 4, regardless of locale, OS, or compiler settings.
If you're trying to apply it to text you've read from that UTF-8 file, your options are:
1. read as-is with std::ifstream, then
1.1 C-style: setlocale(LC_ALL, "en_US.utf8") and mbstowcs(NULL, s.c_str(), s.size()) (Not available on Windows)
1.2 C++11-style: wstring_convert with codecvt_utf8 to make a wstring, then just call its member size() (fully portable, Linux and Windows alike)
2. C++98-style: open with std::wifstream, imbue that with a utf8 locale, and read into a wstring. Then use member size() (Not available on Windows)
if try to mix cout with wcout, the second one whether cout ot wcout would not work as expected, unless the command freopen is be used to reset first output.
|
Yes, that's an unfortunate problem of the C I/O: once you write a narrow or a wide character to stdout, it is locked in that mode until freopen'd. And std::cout/std::wcout both use C's stdout by default. So pick a mode and stick with in: outside Windows, I very strongly prefer UTF-8 (std::cout, not std::wcout). Windows doesn't support it, so there you're stuck with std::wcout or WinAPI. Some implementations make cout and wcout work simultaneously if you std::ios::sync_with_stdio(false), but not gcc.
why are you using <clocale>? |
That's another problem caused by std::cout/std::wcout both using C's stdout by default. You have to call std::setlocale to make C I/O layer used by std::wcout work (when using gcc, at least, some other implementations make it work somehow). You don't need this if you're only using std::cout (such as when doing UTF-8 I/O) or if you're only using files and not console. In gcc, setlocale is not needed to use wcout if you std::ios::sync_with_stdio(false); (but you still need wcout.imbue)
I trying to port win32 application with like wide-exec-charset as UTF-16 to Linux. I know that Linux should have UTF-32 |
Windows does not use UTF-16 for its execution character set. It uses what used to be called "UCS2" (it was removed from Unicode standard), which refers to the 16-bit subset of Unicode. Any UCS2 code is also valid Unicode with the same meaning. This means any character Windows can handle, Linux can do as well: it's porting in the opposite direction that's hard.
In short, there is no need to change gcc's defaults in your case. Don't change input-charset or wide-exec-charset.