making a string with non english letters

Pages: 12
Veltas wrote:
Anyone know why the console stores these characters so differently?
For compatibility M$ decided that Code page 437 ("OEM") is used for the console:

http://en.wikipedia.org/wiki/Code_pages_on_Microsoft_Windows
coder777 wrote:
For compatibility M$ decided that Code page 437 ("OEM") is used for the console


Well that makes sense, really. In fact, that was roughly my guess.

Veltas wrote:
Also I think it changes the codes for different languages (to give support for other languages in a non-Unicode program), and for all we know it doesn't do that in the Command Prompt.


But this is, however, a big pain in the butt. Why they couldn't chose a code that lines up with the Western codes if they were going to go ahead and use the Latin characters in the console anyway? I don't know....

EDIT: Now that I think about it, maybe it does make sense. If the Command Prompt ran all the DOS programs back in older Windows, then it would need to use the correct characters in the console or else they wouldn't draw graphics correctly or display the program's characters correctly.
Last edited on
For the sake of xantavis, and I'm sorry if someone's already provided this, could someone give a standard C++ way to ensure the console outputs using the correct character code? I think suddenly having to import win32 into a beginner's C++ program is a little too much.
Here's a quick reference:
standard C:
To enable whatever user's default settings are:
setlocale(LC_ALL, "");
To enable US English UTF-8:
setlocale(LC_ALL, "en_US.utf8"); // not on Windows
standard C++:
user's defaults:
std::locale::global(std::locale(""));
US English UTF-8:
std::locale::global(std::locale("en_US.utf8")); // not on Windows
you can then apply this or different language locale to each stream:
std::cout.imbue(std::locale()); // apply whatever was last set via global

On Windows, there are no Unicode locales, but there are standard C++ locale-independent Unicode conversions (since VS2010), and, since VS2005, there are Windows-only ways to switch the console to Unicode:
1
2
3
_setmode(_fileno(stdout), _O_U16TEXT); // use UTF-16
_setmode(_fileno(stdout), _O_U8TEXT); // use UTF-8
_setmode(_fileno(stdout), _O_WTEXT); // use wstrings 

(works on files even better, since the console fonts may be lacking)
Last edited on
It's a shame that our Windows C++ compilers don't seem to be able to cope with stuff like making sure our strings work in the Command Prompt, but then again it's the Command Prompt's fault as streams are supposed to be general so everything else would stop working if we changed it for Command Prompt.
@Cubbi

At first there were alot of error messages, but then i found out that you have to use <io.h> and <stdio.h>. But it still wont work!
1
2
3
4
5
6
7
8
9
#include <iostream>
#include <io.h>
#inlclude <stdio.h>
using namespace std;

int main () {
_setmode(_fileno(stdout), _O_U8TEXT);
cout << "mépris";
}




I wrote a little test program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <windows.h>

using namespace std;

BOOL CALLBACK EnumCodePagesInstalledProc(LPTSTR lpCodePageString)
{
  cout << "Installed: " << lpCodePageString << endl;
  return TRUE; // forgot this
}

BOOL CALLBACK EnumCodePagesSupportedProc(LPTSTR lpCodePageString)
{
  cout << "Supported: " << lpCodePageString << endl;
  return TRUE; // forgot this
}


int main()
{
  EnumSystemCodePages(EnumCodePagesInstalledProc, CP_INSTALLED);
  EnumSystemCodePages(EnumCodePagesSupportedProc, CP_SUPPORTED);


  UINT oldcp = GetConsoleOutputCP();
  SetConsoleOutputCP(CP_UTF8);

  cout << "Hello world! mépris" << endl; // This output is wrong (UTF-8 string)
  cout << "Hello world! m" << char(130) << "pris" << endl; // This is right (code page 850)

  SetConsoleOutputCP(oldcp);

  return 0;
}


Unfortunately it shows only a single code page [after fix: it shows all kinds of code pages] for both (while it has no problem to SetConsoleOutputCP(CP_UTF8) , the output is still wrong) on my system (Windows 7)

For more info look at this:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682064%28v=vs.85%29.aspx


Here is the code page 850 (the standard western code page for the console):

http://en.wikipedia.org/wiki/Code_page_850

It shows the characters above 127 only which you may want map like so:

1
2
std::map<std::string, char> map_from_utf8_to_console;
map_from_utf8_to_console["é"] = char(130);
Last edited on
Thanks, but actually, my goal was to create a string with as little lines as possible, that uses non/english characters. Well I just found an easy way, just by myself! I do following:
1
2
3
4
5
6
#include <iostream>
using namespace std;

int main () {

    char mepris [6] = {'m', 130, 'p', 'r', 'i', 's'};


and if I want to print it out, I just create a for loop!

Would this be accepted by 'professionals'?
if you modify your code like so:
1
2
3
4
5
6
#include <iostream>
using namespace std;

int main () {

    char mepris [] = {'m', 130, 'p', 'r', 'i', 's', 0};

you don't need a loop



I mean you can do so, but if you have strings of unknown content you have to map the characters. It'd be no more than a function call when you want to display the string.

It's not that unusual that you have a certain intern format and for different occasions (i/o such as GUI/file etc) you need convert it ot another format
Last edited on
xantavis wrote:
then i found out that you have to use <io.h> and <stdio.h>. But it still wont work!
1
2
_setmode(_fileno(stdout), _O_U8TEXT);
cout << "mépris";

Windows doesn't use UTF-8 for character string literal
1
2
_setmode(_fileno(stdout), _O_WTEXT);
wcout << L"mépris";
Last edited on
All that is unnecesarry!
1
2
3
typedef std::basic_ostream<char16_t> std::u16ostream;
typedef std::basic_streambuf<char16_t> std::u16streambuf;
std::u16ostream std::u16cout(reinterpret_cast<std::u16streambuf*>(std::wcout.rdbuf()))

And then you can
u16cout<<u"m\u0082pris";
http://www.cplusplus.com/forum/general/77259/
It all works for me, and it should for you!
Last edited on
Topic archived. No new replies allowed.
Pages: 12