Unicode troubles...

Tried to output logical "and" and "or".
Have
 
wchar_t sym_and=0x22C0, sym_or=0x22C1;

and do output via std::wcout. Also have
 
setlocale(LC_CTYPE,"");

as suggested
http://www.cplusplus.com/forum/unices/13257/
Otherwise those characters don't show at all.
Obtained squares on screen. Under WinHex checked that 3(!!!!) bytes per symbol was on output:
E28B80
E28B81
in hex respectivetly, while normal usual symbols take only one byte e.g.
31 only for '1', 2c only for comma, 0A only for eol.
Tried also as simply as possible
1
2
3
4
5
6
7
8
9
10
11
12
13
#include <iostream>
//#include <iomanip>
#include <locale>

int main(int argc, const char *argv[])
{
  wchar_t sym_and=0x22C0, sym_or=0x22C1;
  //
  setlocale(LC_CTYPE,"");
  std::wcout << L"Hello, world!" <<std::endl; 
  std::wcout <<sym_and << sym_or<< std::endl;
  return 0;
}


Same result! WTF going on?
Is it smth wrong with CYGWIN under which i do all that stuff?
Hmmm... Noone has same issue?
Alas, printing Unicode to the terminal/console is black magic. (Still.) I'm planning to write an article on this for Windows users, but cross-platform, you need to do output with UTF-8 strings.

As for the mechanics, Forget the standard streams. That's right, they don't work right for this stuff on the console.

Use wprintf() (#include <cstdio>) and the %S format specifier (with a capital-S) to print UTF-8 strings.


Your console must also be prepared to accept/display Unicode. Click on the icon at the top and change the font to "Lucida Console" or "Consolas" (whichever you prefer). Once done, type "chcp 65001" to set the mode to UTF-8.

Good luck!
Obtained squares on screen. Under WinHex checked that 3(!!!!) bytes per symbol was on output:
E28B80

That's correct: \u22C0 in UTF-8 is 0xe2 0x8b 0x80

On a typical Unicode-supporting OS, such as Linux, you'd see your expected character: the following works for me (note, you also mixed up C and C++ locales, although Linux is forgiving for that)

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>
#include <locale>

int main()
{
  std::locale::global(std::locale(""));
  std::wcout.imbue(std::locale());
  
  wchar_t sym_and=L'\u22C0', sym_or=L'\u22C1';
  std::wcout << L"Hello, world!" << '\n' 
             << sym_and << sym_or << '\n';
}


If you're on Windows, you've got extra work to do: use the magical incantation _setmode(_fileno(stdout), _O_WTEXT);, before any other output, and hope your screen font has those characters. ref: http://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
Last edited on

That's correct: \u22C0 in UTF-8 is 0xe2 0x8b 0x80

Yep, checked via http://www.utf8-chartable.de/unicode-utf8-table.pl.
OK, this part is OK. At least glibc/stdc++

About tramslation mode... Problem actually not in terminal, i'm not expect for cygwin terminal to show my symbols correctly. But after redirecting to file and opening it in notepad++ i expected to see everything in right way (that's why i actually mention WinHex, i work with rather regulat files). I'm rather did smth wrong in notepad++ coz it really too good for all that stuff.
which good utf-8 viewer could you suggest for windows ? I see notepad++ doesn't work with those symbols.
problem solved. had wrong font in notepad++ (a nonunicode one).
Topic archived. No new replies allowed.