making a string with non english letters

Pages: 12
Hi,
I'd like to make some strings with non-english letters. For example the word débarquer. How would I make the string word1 equal to débarquer?

Can you use another type instead of string for these kind of words? If I do
1
2
3
4
int main () 
{
string word1 ("débarquer");
}

it doesn't work.

Lets say its not possible. I know that the ASCII for é is 130. So could I now do something like:
1
2
3
4
5
int main () 
{
char eSpecial = 130;
string word1 ("d" + eSpecial + "barquer");
}

?
If your compiler doesn't allow you to write that (most compilers do), use wstring, that is, wstring word1 = L"débarquer";
Another way to write it in modern C++ is UTF-8, that is string word1 = u8"débarquer";

(side note: there is no ASCII for é)
OK, then I'll have to change the compiler. Right now, I am using Codeblocks. What (free) compiler would you recommend? That allows you to use these characters?
I haven't seen one that doesn't, to be honest (but I haven't used Code::Blocks). gcc, clang, intel, are all fine with it.
Code::Blocks uses GCC by default, most likely already support it already. Use UTF-8 encoding when saving the file from IDE settings.

EDIT: Tried with Code::Blocks default settings and it works as expected. Please note that this is not a console project, outputting non-ascii characters to a console is not an easy thing to do.
Last edited on
How do I use UTF-8?

The ASCII for é is 130, no???
This works for me, altough it depends on my windows installation codepage:
1
2
3
4
5
6
7
8
9
10
#include <windows.h>
#include <string>
using namespace std;

int main()
{
    string s ("débarquer");
    MessageBoxA (NULL, s.c_str(), NULL, MB_ICONINFORMATION);
    return 0;
}


It is advised to use only Unicode projects and std::wstring instead to overcome this limitation.
How do I use UTF-8?
With Code::Blocks you use UTF-8 automatically, but it doesn't help (nor does ASCII) since you need to convert it to the encoding the outpunt uses

The ASCII for é is 130, no???
Yes, it is. It's an extended code

http://www.asciitable.com/
OK. So I tried using the wstring in a console application project. But then there comes this error message. I did following:
1
2
3
4
5
6
7
8
#include <iostream>
#include <string>
using namespace std;

int main () {
wstring word = L"mépris";
cout << word;
}


The operator << was not accepted. Whats wrong now?
Look at this:

http://www.codeproject.com/Articles/34068/Unicode-Output-to-the-Windows-Console

you can change the code page of the console to UTF-8. What I'd recommend. Then you don't need wstring
OK. So I tried using the wstring in a console application project. But then there comes this error message. I did following:
1
2
3
4
5
6
7
8
#include <iostream>
#include <string>
using namespace std;

int main () {
wstring word = L"mépris";
cout << word;
}


The operator << was not accepted. Whats wrong now?

Edit: OK, I now know that you habe to place a w before cout. The program works fine, BUT the é is not an é but ú. So still something is wrong. What?
UTF-8 that's awesome! Thanks so much Cubbi had no idea!

Also, é is ANSI-ASCII. It should work in a normal char string and output correctly in the Windows console (but it won't work on Linux which uses just base ASCII).

EDIT: I love having a British keyboard, we can type é so easily. Do US keyboards have that too, out of interest? I'm pretty sure French keyboards do...
Last edited on
Linux uses UTF-8 pretty much always these days, so é, Ж, 猫, etc, in a regular std::string stores and outputs just fine. As intended in C and C++: both languages use multibyte character encoding for character strings, conversions, and I/O.
(conversions and I/O may require setlocale/locale::global of course)
Last edited on
Cool, something else I didn't know Cubbi! I know it doesn't work with extended ASCII but that would make sense if it works with UTF-8!
@xantavis
wstring/L"" is the code page UTF-16. It doesn't help either. the windows console has its own code page which is not ASCII. Changing the code page to UTF-8 suffice with SetConsoleOutputCP() for windows. Then you can use cout without problems

in linux the output is as far as I know UTF-8

1.
Changing the code page to UTF-8 suffice with SetConsoleOutputCP() for windows

Can you (or someone else) give me an example?

2. If I wanted to use a wstring, then I'd do following:
1
2
3
4
5
6
7
8
#include <iostream>
#include <string>
using namespace std;

int main () {
wstring word = L"mépris";
wcout << word;
}

On the screen, it prints out múpris. So something is still wrong. What?

Sounds like Windows issues, which may be curable by _setmode(_fileno(stdout), _O_U16TEXT);
I wrote:
Look at this:

http://www.codeproject.com/Articles/34068/Unicode-Output-to-the-Windows-Console

you can change the code page of the console to UTF-8. What I'd recommend. Then you don't need wstring
Look at the example shown. switch to UTF-8 and you will see your special character
Windows has its own version of ANSI extended ASCII. In fact I don't think it was ever approved by ANSI. Also I think it changes the codes for different languages (to give support for other languages in a non-Unicode program), and for all we know it doesn't do that in the Command Prompt.

I made a quick test program for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <iostream>

int main()
{
	// First output all readable codes.
	for (unsigned char i = 0x20U; i; ++i) {
		std::cout << (unsigned) i << '\t' << i << std::endl;
	}
	
	// Then output specifically the 'é' character and its code.
	std::cout << std::endl << std::endl <<
		(unsigned) ((unsigned char) 'é') << '\t' << 'é' << std::endl;
	
	return 0;
}


Which gives output (with double ellipses for brevity) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
32
33      !
34      "
35      #
...
...
129     ü
130     é
131     â
...
...
232     Þ
233     Ú
234     Û
...
...
253     ²
254     ■
255      


233     Ú 


Anyone know why the console stores these characters so differently?
Last edited on
@coder777

thanks for the link, but I dont understand it. As a rookie, this is to complicated for me. But Id like to make a simple program (for me). As it is in french, i'd need the non-english characters. So could you maybe write an example function? Then I would only insert where I have to. And if I have devoloped enough, then I might also try to understand it. Its not professional, I know, but it should do the work.
Pages: 12