LPWSTR to std::string

Jul 5, 2012 at 10:40am
The win32 api mostly uses the wchar_t and LPWSTR stuff to work with, and those types are incompatible with std::string(or so it seems) so now how could I simply convert LPWSTR to std::string and std::string to LPWSTR?

Also ofstream seems to write only memory adresses to file, could that be fixed?
Jul 5, 2012 at 11:19am
You can use WideCharToMultiByte() and MultiByteToWideChar() to convert between ANSI and unicode strings. Ideally you would use std::wstring instead unless you absolutely have to use ANSI strings for some reason.

Oh, and you can explicitly call the ANSI version of the Windows function that has a string parameter to have it do the converting for you. ANSI version has an A at the end of the function name. EX: MessageBoxA instead of MessageBox.
Last edited on Jul 5, 2012 at 11:23am
Jul 5, 2012 at 11:28am
how about with fstreams?
Jul 5, 2012 at 1:17pm
well I solved it with this:
1
2
3
4
5
6
7
8
9
10
std::string wstrtostr(const std::wstring &wstr)
{
    std::string strTo;
    char *szTo = new char[wstr.length() + 1];
    szTo[wstr.size()] = '\0';
    WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL);
    strTo = szTo;
    delete[] szTo;
    return strTo;
}
Jul 5, 2012 at 2:45pm
Errr... use std::wstring instead and leave the 20th century char in the past? You would then have full compatibility with LPWSTR.
Jul 5, 2012 at 8:09pm
Windows version of std::wstring is hardly a step into the future, since it's frozen in the age of 16-bit "Unicode", retrofitted to hold UTF-16.

The ideal is std::u32string (holding UTF-32) for program logic and std::string (holding UTF-8) for I/O.
Last edited on Jul 5, 2012 at 8:09pm
Jul 5, 2012 at 8:17pm
May not be cutting edge but it is still a step forward.
Jul 5, 2012 at 8:18pm
About using u32string, how would I use it with WINAPI or CSTDIO? It seems there only is support for UTF16, at least with the MS CRT. Also, in a u32string, is a "character" a unsigned long (DWORD) ?
Jul 5, 2012 at 9:09pm
About using u32string, how would I use it with WINAPI or CSTDIO? It seems there only is support for UTF16, at least with the MS CRT

Yes, WinAPI supports UTF-16 and in a few places, UTF-8. For UTF-32, there are standard C++ conversion routines (supported since VS 2010), multiple libraries (iconv, ICU), and, really, 32-16 conversion is trivial to write yourself.

in a u32string, is a "character" a unsigned long (DWORD) ?

No, it is a char32_t.
Jul 5, 2012 at 10:06pm
No, it is a char32_t.

And the typedef for char32_t is? Signed Long i bet.
Jul 5, 2012 at 10:32pm
It's not a typedef (except in VS2010, but that's a bug)
Jul 5, 2012 at 10:36pm
Windows uses UTF-16 internally(Win32 +COM, .NET, WinRT, resource strings, registry strings, and so on). Using anything else just adds overhead to your program. UTF-16 is already capable of representing characters in 32 bits with "surrogates". I don't see how UTF-32 is in any way ideal.
Jul 5, 2012 at 11:03pm
Windows uses UTF-16 internally

And that would have been fine if those internals were actually internalized.

I don't see how UTF-32 is in any way ideal.

It gives your strings a 1:1 correspondence between elements of storage and code points. The elements of the Windows version of std::wstring do not correspond to anything meaningful.
Jul 6, 2012 at 12:19am
Well that post went over my head to be quite honest, perhaps I should read up on localization. But I have to question the relevance of this in Windows application development because you are the first person I have seen to suggest using UTF-32 strings instead of UTF-16 despite the overhead involved: the extra memory required to store a single character, the constant allocating and freeing of temporary buffers for converted strings, and the actual process of converting. Why go through all of that when you can just give Windows what it expects?
Jul 6, 2012 at 10:43am
Well, in fact, while Windows' standard is UTF16, even if you get UTF32 from the user whichever way, you will have to "truncate" the values to half.

The only usefulness is with files, if you reach a UTF32 file (maybe from a different os that supports UTF32) you will be able to read it correctly - but still will have to truncate it to display.

Anyways probably I'm not going to use UTF32 as far as there is no Windows/STDIO UTF32 standard - And when it will, I'll expand my class to use char32_t then.
Jul 6, 2012 at 12:33pm
Files (and other communication) are best in UTF-8, since there are no endianness issues and no stray zero bytes (unless you're in China, where Unicode is GB18030 :)

Take Linux for example: you take a std::wstring (UTF-32 there, as on almost every platform besides Windows), output to an std::wofstream or std::wcout, and get UTF-8 in file/on screen (if a utf8 locale is in effect, but that's the default on most distros). Same on the way back.

With UTF-16 in a wstring, you simply cannot *use* C++ I/O, because it's designed for 1-to-N and N-to-1 conversions only.
Topic archived. No new replies allowed.