How to convert a UNICODE string to ANSI

Jun 7, 2016 at 5:01pm
Hi,
I have

wchar_t wszMessage[1000];

and need to return char const*

Can somebody point me to the ways I can convert from wide strings above to normal chars?

If it could be done using C++ Standard Library it would be much better..

Thanks,
Juan
Jun 7, 2016 at 6:01pm
based on your variable name "wszMessage" and terminology ("UNICODE" and "ANSI"), you're dealing with Windows, and you have a UTF-16 encoded string represented as an array of 16-bit wchar_t's.

The Windows API for such conversion is WideCharToMultiByte https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.110).aspx

You asked for a C++ standard library approach, but because Windows does not actually support Unicode, even standard C++ approach, and even if you're converting to UTF-8, which is what most people would expect a string of "normal chars" to hold, instead of "ANSI", would still be non-portable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <cassert>
#include <iomanip>
#include <string>
#include <codecvt>

int main()
{
	assert(sizeof(wchar_t) == 2); // This program is Windows-only
// On an implementation that actually supports Unicode,
// use codecvt_utf8 instead of codecvt_utf8_utf16 below

	wchar_t wszMessage[1000] = L"z\u00df\u6c34\U0001f34c";

	std::cout << "the UTF-16 string contains " << wcslen(wszMessage) << " 16-bit code points: \n"; // prints 5
	std::cout << std::hex << std::setfill('0');
	for (wchar_t c : std::wstring(wszMessage))
		std::cout << std::setw(4) << static_cast<int>(c) << ' ';
	std::cout << std::dec << '\n';

	std::wstring_convert<std::codecvt_utf8_utf16<wchar_t, 0x10ffff, std::little_endian>> cvt;
	std::string str = cvt.to_bytes(wszMessage);

	std::cout << "the UTF-8 string contains " << str.size() << " bytes: \n"; // prints 10
	std::cout << std::hex << std::setfill('0');
	for (unsigned char c : str)
		std::cout << std::setw(2) << static_cast<int>(c) << ' ';
	std::cout << std::dec << '\n';
}


live demo: http://rextester.com/LRCT6549

This demo takes a string consisting of 4 characters, stores it in a wide string (which takes 5 wchar_t's on Windows and 4 wchar_t's everywhere else), and then converts it to UTF-8 using the standard C++ library (the result takes 10 bytes as expected of that particular string)
Last edited on Jun 7, 2016 at 6:10pm
Jun 7, 2016 at 7:20pm
Thanks!

Juan
Topic archived. No new replies allowed.