Mixing std::wstring with ICU?

Forum

Forum
Windows Programming
Mixing std::wstring with ICU?

Mixing std::wstring with ICU?

I have decided to use ICU for some parts of my project. This project is exclusively on Windows, and interacts with the WinAPI. Is it safe to use std::wstring for this purpose? (I am not sure if std::wstring uses the UTF-16 as a default encoding, as that is what windows uses).

A while back I found this small set of template functions for converting std::wstring to std::string (and back again), such as storing UTF-8 text in files. Is this sufficient?

/** Converts a std::wstring into a std::string with UTF-8 encoding.
 */
template < typename StringT >
StringT utf8 ( std::wstring const & rc_string );

/** Converts a std::String with UTF-8 encoding into a std::wstring.
 */
template < typename StringT >
StringT utf8 ( std::string const & rc_string );

/** Nop specialization for std::string.
 */
template < >
inline std::string utf8 ( std::string const & rc_string )
{
  return rc_string;
}

/** Nop specialization for std::wstring.
 */
template < >
inline std::wstring utf8 ( std::wstring const & rc_string )
{
  return rc_string;
}

template < >
std::string utf8 ( std::wstring const & rc_string )
{
  std::string result;
  if(rc_string.empty())
    return result;

  std::vector<UChar> buffer;

  result.resize(rc_string.size() * 3); // UTF-8 uses max 4 bytes per char
  buffer.resize(rc_string.size() * 2); // UTF-16 uses 2 code-points per char

  UErrorCode status = U_ZERO_ERROR;
  int32_t len = 0;

  u_strFromWCS(
    &buffer[0],
    buffer.size(),
    &len,
    &rc_string[0],
    rc_string.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strFromWCS failed");
  }
  buffer.resize(len);

  u_strToUTF8(
    &result[0],
    result.size(),
    &len,
    &buffer[0],
    buffer.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strToUTF8 failed");
  }
  result.resize(len);

  return result;
}/* end of utf8 ( ) */


template < >
std::wstring utf8 ( std::string const & rc_string )
{
  std::wstring result;
  if(rc_string.empty())
    return result;

  std::vector<UChar> buffer;

  result.resize(rc_string.size());
  buffer.resize(rc_string.size());

  UErrorCode status = U_ZERO_ERROR;
  int32_t len = 0;

  u_strFromUTF8(
    &buffer[0],
    buffer.size(),
    &len,
    &rc_string[0],
    rc_string.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strFromUTF8 failed");
  }
  buffer.resize(len);

  u_strToWCS(
    &result[0],
    result.size(),
    &len,
    &buffer[0],
    buffer.size(),
    &status
  );
  if(!U_SUCCESS(status))
  {
    throw XXXException("utf8: u_strToWCS failed");
  }
  result.resize(len);

  return result;
}/* end of utf8 ( ) */

Usage:

1
2

std::string s = utf8<std::string>(std::wstring(L"some string"));
std::wstring s = utf8<std::wstring>(std::string("some string"));

I prefer std::wstring compared to UnicodeString, but I'm not sure of what exactly I should use.

webJose (2948)

I am 99% that the encoding used has nothing to do with std::string or std::wstring. These are just generic string classes based on char and wchar_t respectively. Whatever interpretation you want to give to the data in them is up to you. Basically, std::string stores each char in a single byte, while std::wstring stores each char in two bytes.

rem45acp (48)

Ok, that answer's most my question. Thank you. However, are those two conversion templates doing a proper conversion between the two string classes? I can't contact the original author of them as I can't find where I got them.

Last edited on

webJose (2948)

I have no idea if the conversions are OK. In Windows, I would have Used MultibyteToWideChar(). See http://msdn.microsoft.com/en-us/library/dd319072(v=vs.85).aspx .

Topic archived. No new replies allowed.

C++

Forum

Mixing std::wstring with ICU?