std::string to std::wstring with extended characters (unicode)

Apr 21, 2013 at 10:24pm
Hi, I have an app that is required to run with some extended characters, such as the characters defined in Latin-1 Supplement and, possibly, Latin Extended-A, as described in Unicode ( see http://en.wikipedia.org/wiki/List_of_Unicode_characters, for example), so my compiler is set to use the Unicode character set.

I get an std::string that includes such characters ( e.g. "ú", "é", and others), and I need to output a wchar_t*.

Can wchar_t* even handle these characters?



This is how I usually convert:

1
2
3
4
5
6
7
8
9
std::string input = "Bancé";

std::wstring wCmd = std::wstring(cmd.begin(), cmd.end()); //(1)

WCHAR* wCCmd = const_cast<WCHAR*>(wCmd.c_str());  // wchar_t typedef

SQLWCHAR* output = wCCmd; //wchar_t typedef



I am using windows typedefs, but WCHAR* and SQLWCHAR* are just the same type wchar_t*, i.e. what I want as out.

This conversion usually works, but, for the case where I have a string, such as "Bancé" above,

at step (1) (i.e. conversion from std::string to std::wstring), the extended character "é" becomes "←", (i.e. "Bancé" becomes "Banc←").

What can I do to use extended Unicode characters (at least Latin-1 Supplement, and possibly, Latin Extended-A) in my std::wstring and the types that follow it?

(I guess it comes down to converting a char that supports these characters to a wchar_t that supports these characters, but, in the end, I am using std::string as input, so I kept it that way in code).

Does it depend on my compiler settings, or something else entirely?




Thanks for any help!!! :)

C :)

Last edited on Apr 21, 2013 at 10:26pm
Apr 21, 2013 at 10:52pm
try putting an L before your wide string literal
std::string input = L"Bancé";
Apr 21, 2013 at 10:59pm
On Windows (in Visual Studio specifically), when you write "Bancé", you're actually writing "Banc\xe9" (on sane systems, such as Linux, you actually get "Banc\xc3\xa9", but that's another story)

When you use wstring's range constructor, it performs a static_cast for each char in your string to form a wchar_t, so it converts '\xe9' into L'\xffe9', which you see as a box. What you're looking for is L'\x00e9', which you can get if you use a real multibyte-to-wide conversion. There are many in C++, here's one that works in this case (it relies on the default locale's ctype::widen()):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <string>
#include <sstream>
#include <fcntl.h>
#include <io.h>

int main()
{
    std::string input = "Bancé";
    std::wostringstream conv;
    conv << input.c_str();
    std::wstring wCmd(conv.str());

    _setmode(_fileno(stdout), _O_WTEXT); // MSVC's special needs
    std::wcout << wCmd << '\n';
}
Last edited on Apr 21, 2013 at 11:03pm
Apr 21, 2013 at 11:29pm
awesome, I'll have to us that more often! Thanks for your help!! :)
Apr 22, 2013 at 4:12am
Thanks For Use.
Topic archived. No new replies allowed.