Sep 6, 2022 at 6:08pm Sep 6, 2022 at 6:08pm UTC
CString is a wchar_t based string type; what encoding does it use? how can one translate from CString to std::string?
Thanks!
Juan
Sep 6, 2022 at 6:32pm Sep 6, 2022 at 6:32pm UTC
CString
simply is an alias for CStringT<TCHAR, ...>
.
TCHAR
is defined as either wchar_t
or char
, depending on whether you project is configured with the Unicode or Multi-Byte character set.
There also are CStringA
and CStringW
that are defined as CStringT<char , ...>
and CStringT<wchar_t , ...>
, respectively.
Typically, char
-based strings use whatever character-encoding is configured as "ANSI" Codepage on the system where the program runs.
And wchar_t
-based strings typically use Unicode character set with UTF-16 encoding. At least on Windows.
Last edited on Sep 6, 2022 at 7:02pm Sep 6, 2022 at 7:02pm UTC
Sep 6, 2022 at 6:38pm Sep 6, 2022 at 6:38pm UTC
Ok, but if TCHAR is chosen as wchar_t, what is CString encoding UTF-16 or Unicode? and how can we translate from this wide character string to std::string?
Sep 6, 2022 at 6:42pm Sep 6, 2022 at 6:42pm UTC
Unicode is character set that assigns a unique number ("code-point") to each character.
How those Unicode characters (code-points) actually are stored/transferred, that is defined by the specific
encoding !
UTF-16 is one such Unicode encoding.
UTF-8 is another popular Unicode encoding.
As said before, on Windows, where
wchar_t
is 16-Bit in size (per character),
UTF-16 is typically used for "Unicode" strings.
______
std::string
simply is a wrapper for a sequence of
char
's. It can store whatever "multi-byte" character encoding that you like 😄
Possibilities include
UTF-8 (Unicode) or
Latin-1 (ISO 8859-1).
As far as the Win32 API is concerned, functions dealing with
char
-strings assume the "ANSI" Codepage configured on the local system.
You can use
GetACP()
to detect the "ANSI" Codepage that is configured on the current system...
________
To
convert between
char
-based on
wchar_t
-based strings, see here:
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte
Last edited on Sep 6, 2022 at 7:03pm Sep 6, 2022 at 7:03pm UTC
Sep 6, 2022 at 7:08pm Sep 6, 2022 at 7:08pm UTC
Ok to all that info... but still unanswered is "how can we translate from this wide character CString to std::string?"
How about this code:
1 2 3 4 5 6 7
inline std::string to_string(const std::wstring& str, const std::locale& loc = std::locale{})
{
std::vector<char > buf(str.size());
std::use_facet<std::ctype<wchar_t >>(loc).narrow(str.data(), str.data() + str.size(), '?' , buf.data());
return std::string(buf.data(), buf.size());
}
??
Last edited on Sep 6, 2022 at 7:10pm Sep 6, 2022 at 7:10pm UTC
Sep 6, 2022 at 7:12pm Sep 6, 2022 at 7:12pm UTC
This totally depends on two things:
1. Is your CString
actually CStringT<char >
or CStringT<whcar_t>
?
2. What character encoding do you want your std::string
to be encoded in? Latin-1? UTF-8? User's local "ANSI" codepage?
Sep 6, 2022 at 7:12pm Sep 6, 2022 at 7:12pm UTC
I get this deprecated in C++ 17!!
Sep 6, 2022 at 7:19pm Sep 6, 2022 at 7:19pm UTC
Ok:
CString is actually CStringT<wchar_t>.
What would be the solution if the desired character encoding was:
1- Latin-1 ???
2- UTF-8 ???
3- user's local ANSI codepage ???
how would we program these 3 ways of translating the CStrings?
Sep 6, 2022 at 7:25pm Sep 6, 2022 at 7:25pm UTC
Try something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
static std::string convert(const CStringW str, const int targetEncoding)
{
std::string result;
const int size = WideCharToMultiByte(targetEncoding, 0U, str.GetString(), str.GetLength(), NULL, 0, NULL, NULL);
if (size > 0)
{
std::vector<char > buffer(size);
const int ret = WideCharToMultiByte(targetEncoding, 0U, str.GetString(), str.GetLength(), buffer.data(), (int )buffer.size(), NULL, NULL);
if ((ret > 0) && (ret <= size))
{
result = std::string(buffer.cbegin(), buffer.cend());
}
}
return result;
}
int main()
{
CStringW input(L"Hello W0rld!" );
std::cout << '"' << convert(input, CP_UTF8) << '"' << std::endl;
}
Above function takes as input a
CStringW
, assuming that it contains an Unicode string, in
UTF-16 encoding.
The desired output encoding can be selected by the parameter. I choose
CP_UTF8 (UTF-8) in the example.
Last edited on Sep 6, 2022 at 7:44pm Sep 6, 2022 at 7:44pm UTC
Sep 6, 2022 at 7:27pm Sep 6, 2022 at 7:27pm UTC
You can then get a C string from that CString, which can then be plugged into creating a C++ std::string.
How can we get a C String from a CStringT<wchar_t>??? as far as I know, its not possible!!
Last edited on Sep 6, 2022 at 7:29pm Sep 6, 2022 at 7:29pm UTC
Sep 6, 2022 at 7:39pm Sep 6, 2022 at 7:39pm UTC
ok!!
Question where did you get the value for UTF-8 (CP_UTF8)? from which header?
Last edited on Sep 6, 2022 at 7:39pm Sep 6, 2022 at 7:39pm UTC
Sep 6, 2022 at 7:45pm Sep 6, 2022 at 7:45pm UTC
It's defined by <Windows.h>
, or by something that implicitly gets included when <Windows.h>
is included.
You need to include <Windows.h>
anyway, for WideCharToMultiByte()
function.
But, as said before, if you want the default "ANSI" Codepage of the local system, you can simply use GetACP()
function.
Last edited on Sep 6, 2022 at 7:48pm Sep 6, 2022 at 7:48pm UTC
Sep 6, 2022 at 7:48pm Sep 6, 2022 at 7:48pm UTC
CP_UTF8
means UTF-8 .
CP_ACP
means "whatever happens to be configured as the 'ANSI' Codepage on the local machine"
Note: On an English system, CP_ACP
probably is Windows-1252 , but it can be changed in the Windows control panel to something else.
Don't make any assumptions about what the local "ANSI" Codepage might be. It can be different on each computer!
Last edited on Sep 6, 2022 at 7:52pm Sep 6, 2022 at 7:52pm UTC
Sep 6, 2022 at 8:10pm Sep 6, 2022 at 8:10pm UTC
@kigar64551 great answer! But i have another corresponding question: what would be the code to convert std::string to CStringW? and, is there no way to do both of these conversions using only standard C++ 20?
Last edited on Sep 6, 2022 at 8:11pm Sep 6, 2022 at 8:11pm UTC
Sep 6, 2022 at 8:40pm Sep 6, 2022 at 8:40pm UTC
Last edited on Sep 6, 2022 at 8:41pm Sep 6, 2022 at 8:41pm UTC
Sep 6, 2022 at 9:18pm Sep 6, 2022 at 9:18pm UTC
don't forget string has a wide version, so you could just use that without conversion: std::wstring maybe work for whatever you are doing?