What is the encoding of CString?

Forum

Forum
Windows Programming
What is the encoding of CString?

What is the encoding of CString?

Pages: 12

Sep 6, 2022 at 6:08pm

CString is a wchar_t based string type; what encoding does it use? how can one translate from CString to std::string?

Thanks!
Juan

Sep 6, 2022 at 6:32pm

kigar64551 (837)

CString simply is an alias for CStringT<TCHAR, ...>.

TCHAR is defined as either wchar_t or char, depending on whether you project is configured with the Unicode or Multi-Byte character set.

There also are CStringA and CStringW that are defined as CStringT<char, ...> and CStringT<wchar_t, ...>, respectively.

Typically, char-based strings use whatever character-encoding is configured as "ANSI" Codepage on the system where the program runs.

And wchar_t-based strings typically use Unicode character set with UTF-16 encoding. At least on Windows.

Last edited on Sep 6, 2022 at 7:02pm

Sep 6, 2022 at 6:38pm

JUANDENT (411)

Ok, but if TCHAR is chosen as wchar_t, what is CString encoding UTF-16 or Unicode? and how can we translate from this wide character string to std::string?

Sep 6, 2022 at 6:42pm

kigar64551 (837)

Unicode is character set that assigns a unique number ("code-point") to each character.

How those Unicode characters (code-points) actually are stored/transferred, that is defined by the specific encoding!

UTF-16 is one such Unicode encoding. UTF-8 is another popular Unicode encoding.

As said before, on Windows, where wchar_t is 16-Bit in size (per character), UTF-16 is typically used for "Unicode" strings.

______

std::string simply is a wrapper for a sequence of char's. It can store whatever "multi-byte" character encoding that you like 😄

Possibilities include UTF-8 (Unicode) or Latin-1 (ISO 8859-1).

As far as the Win32 API is concerned, functions dealing with char-strings assume the "ANSI" Codepage configured on the local system.

You can use GetACP() to detect the "ANSI" Codepage that is configured on the current system...

________

To convert between char-based on wchar_t-based strings, see here:

https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte

Last edited on Sep 6, 2022 at 7:03pm

Sep 6, 2022 at 7:08pm

JUANDENT (411)

Ok to all that info... but still unanswered is "how can we translate from this wide character CString to std::string?"

How about this code:

 inline std::string to_string(const std::wstring& str, const std::locale& loc = std::locale{})
	{
		std::vector<char> buf(str.size());
		std::use_facet<std::ctype<wchar_t>>(loc).narrow(str.data(), str.data() + str.size(), '?', buf.data());

		return std::string(buf.data(), buf.size());
	}

Last edited on Sep 6, 2022 at 7:10pm

Sep 6, 2022 at 7:12pm

kigar64551 (837)

This totally depends on two things:

1. Is your CString actually CStringT<char> or CStringT<whcar_t>?

2. What character encoding do you want your std::string to be encoded in? Latin-1? UTF-8? User's local "ANSI" codepage?

Sep 6, 2022 at 7:12pm

JUANDENT (411)

I get this deprecated in C++ 17!!

Sep 6, 2022 at 7:19pm

JUANDENT (411)

Ok:

CString is actually CStringT<wchar_t>.

What would be the solution if the desired character encoding was:

1- Latin-1 ???
2- UTF-8 ???
3- user's local ANSI codepage ???

how would we program these 3 ways of translating the CStrings?