Actually unicode is not for console..
unicode is mostly used in GUI applications..
[snip]
why one want to use unicode on console is a little strange
|
Unicode is just a standardized way to represent text consisting of virtually any character, with each character having a unique identifier. Like an "all in one" character set. The alternative to this is to yutz around with locale settings in order to get anything beyond basic Latin characters and symbols. This may not seem like a big deal if you're an English speaker because ASCII has the entire English alphabet, but it's really a big mess.
The reason to use Unicode in a console program is no different than the reason to use Unicode in any other kind program.
Say you make a simple console program to print a file, that works via commandline:
printfile <filename>
Do you really want 'filename' to just be ASCII? That will make your program unusable (or at least more difficult to use) for printing files that might have foreign characters in the name.
Really, there's little reason
not to use Unicode all the time (other than its nonexistant standard lib support).
the only way to cross platformly support UTF-8 in a console would be to use a library that translates a UTF-8 string to a corresponding ANSI and outputs it... isn't that what those libraries do? |
I don't know for sure. I'm
positive that Windows (and Linux, Mac, and any other OS worth using) use Unicode internally, though. So I don't really think any conversion is necessary because the OS will ultimately have to convert it
back to Unicode. This is why I'm so dumbfounded that it's so hard to get Unicode to output to the Win console -- you'd think it'd be easy!
Whether or not conversion is done, though, is another matter. Duoas said that wcout 'narrows' the string you give it before outputting it (*facepalm* then wtf is the point?), so standard libs might be doing some conversion stuff before they hand the data off to the actual OS, which might convert it to something else. I'm starting to think that maybe the way to go is to bypass standard libs completely and stick with OS system calls (but hide them behind an abstract interface so you can port to other platforms by simply writing a new version of that interface). Maybe if WinAPI has SetConsoleOutputCP, there's functions to output to the console that don't use the standard libs (like ConsoleOut()) or something. I'll have to look into that.
</rambling>
EDIT
-------------------------------------
There are, in fact, WriteConsole and ReadConsole WinAPI functions. I'm willing to bet that SetConsoleOutputCP will actually
work with these functions... so UTF-8 is likely possible.
I say use UTF-8 text, and wrap output in a container class. Portability + internationalization + consistent output = win.
Here's a simple idea of what the container class for Windows might look like (but I didn't try to compile this, as I'm not on Windows, so this might not work at all)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
|
class Console
{
public:
Console()
{
hInHandle = GetStdHandle(STD_INPUT_HANDLE);
hOutHandle = GetStdHandle(STD_OUTPUT_HANDLE);
uInCP = GetConsoleCP();
uOutCP = GetConsoleOutputCP();
SetConsoleCP(65001);
SetConsoleOutputCP(65001);
}
~Console()
{
SetConsoleCP(uInCP);
SetConsoleOutputCP(uOutCP);
}
void Out(const char* text)
{
DWORD t;
WriteConsole( hOutHandle, text, std::strlen(text), &t, NULL );
}
// for input do something similar with hInHandle -- too lazy to write that routine
private:
// to disallow copying
Console(const Console&);
Console& operator = (const Console&);
};
//-----------------------------------------------------
// to use
int main()
{
Console c;
char someunitext[] = "Текст на кирилица" // must be UTF-8 encoded
// I hope your compiler doesn't bork this
c.Out(someunitext);
return 0;
}
|
Hopefully that'll work. Try it and see. I'll keep my fingers crossed.
Basically we're having to rewrite cout to be less stupid about Unicode.
*shakes fist at the C++ standard libs*
ANOTHER EDIT:
There are "Unicode" and "ANSI" versions of WriteConsole and ReadConsole (WinAPI does this lots of their functions). Basically the
real functions are WriteConsoleW or WriteConsoleA, and 'WriteConsole' just gets #defined as one of them depending on whether or not UNICODE was defined.
Since we're using UTF-8 and char* here, we actually might want the non-Unicode version. So instead of WriteConsole, you might want to use WriteConsoleA. Try all 3 and see which work and which don't.
Or, you could use the wchar_t version, but I'd avoid that because the width of wchar_t's vary greatly on other platforms (is it UTF-16? UTF-32? no way to know -- but we can assume to treat char* as always UTF-8)