Hi to all mighty rulers of C++ scary and strange world. I need to complete task exceding my power of C++ warrior so I am asking for your precious help.
I have following issue to solve:
I have got a (char*) string like this: "010C00610075002000730076011B007400650021"
It is hexa representation of UTF-16BE encoded string. (in real world it means "Čau světe!" but this is not that important)
The issue is - how to convert this string to wchar_t*. I need to have platform independend solution (or at least to compile it under linux and windows).
Any hint, piece of code or any kind of help from you - kings and mighty rulers of C++ world - will be appreciated and payed back by writing "Thanks" under your post. ;)
There isn't a single bit of code that will convert the string correctly for both Windows and Linux as wchar_t is UTF-16LE on Windows and UTF-32 (don't know if endianness is important here?) on Linux.
In the Windows case, it should just be a case of swapping the byte pairs for the string to become a wchar_t, but I've not actually tried this. And I don't know how to convert from UTF-16 to UTF-32, apart from by using a library.
I don't know what characters are represented by codepoints over U+FFFF so i don't know if i care about them. I need to represent correctly at least all midle europe characters (like ščřžáéąęłżźöü...and some others). Do you know if they have codepoints under U+FFFF?
2andywestken:
One commonly used string format/encoding conversion library is libiconv.
I will definitely think about using the library.
Do you actually have to use whar_t in your code?
No, I don't. Probably i can rewrite parts of code to use other string type. But i need to write this hexa UTF16BE string after conversion to text file (I use wofstream so wchar_t seemed good for it).
I don't know what characters are represented by codepoints over U+FFFF so i don't know if i care about them. I need to represent correctly at least all midle europe characters (like ščřžáéąęłżźöü...and some others). Do you know if they have codepoints under U+FFFF?
Yes. All Latin characters are under U+FFFF. The codepoints over U+FFFF are all pretty obscure.
Great, thanks a lot.
For now I will try the Dishes solution and if I will ever need other characters too or have any problem with it than i will use the liberary andywestken posted.