Dec 23, 2015 at 5:57pm Dec 23, 2015 at 5:57pm UTC
The problem is that UTF-8 uses more than one byte for many characters.
The character ö is stored as two bytes (0xC3, 0xB6). Using your method you will split them so that they are stored as
two elements in hElemanlar but that will not display correctly because it's no longer valid UTF-8. You need to keep them together as
one element in hElemanlar.
https://en.wikipedia.org/wiki/UTF-8#Description
Last edited on Dec 23, 2015 at 5:59pm Dec 23, 2015 at 5:59pm UTC
Dec 23, 2015 at 6:05pm Dec 23, 2015 at 6:05pm UTC
Thank you for your response. Can you guide me how to split such characters?
Last edited on Dec 23, 2015 at 6:06pm Dec 23, 2015 at 6:06pm UTC
Dec 23, 2015 at 6:36pm Dec 23, 2015 at 6:36pm UTC
Here's a couple ways:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
#include <string>
#include <iostream>
#include <vector>
#include <locale>
#include <codecvt>
#include <clocale>
int main()
{
std::string arrWords[10] = {"ひらがな" , "カタカナ" };
// fully portable, using C++11 library
{
std::vector<std::string> hElemanlar;
std::wstring_convert<std::codecvt_utf8<char32_t >, char32_t > cvt;
for (char32_t c: cvt.from_bytes(arrWords[0]))
hElemanlar.push_back(cvt.to_bytes(c));
std::cout << "Printing arrWords[0] from hElemanlar...\n" ;
for (std::string& c: hElemanlar)
std::cout << c << '\n' ;
}
// portable except to Windows, using the C library
{
std::vector<std::string> hElemanlar;
std::setlocale(LC_ALL, "en_US.utf8" ); // any utf-8 locale works
std::mbstate_t mb{};
int len;
for (const char * p = &arrWords[1][0], *end = p + arrWords[1].size(); p < end; p += len ) {
len = std::mbrlen(p, end - p, &mb);
if (len < 0) break ;
hElemanlar.emplace_back(p, p + len);
}
std::cout << "Printing arrWords[1] from hElemanlar...\n" ;
for (std::string& c: hElemanlar)
std::cout << c << '\n' ;
}
}
this gives
Printing arrWords[0] from hElemanlar...
ひ
ら
が
な
Printing arrWords[1] from hElemanlar...
カ
タ
カ
ナ
demo:
http://coliru.stacked-crooked.com/a/b7c6f11e42a43b62
Last edited on Dec 23, 2015 at 6:40pm Dec 23, 2015 at 6:40pm UTC
Dec 23, 2015 at 8:27pm Dec 23, 2015 at 8:27pm UTC
Seems like codecvt isn't supported by GCC. Thank you very much anyways.
Dec 23, 2015 at 8:41pm Dec 23, 2015 at 8:41pm UTC
It is supported by gcc as of version 5.0 (the demo link above uses gcc). Visual Studio and clang's libc++ had it since about 2010.
Lacking both C++11 and non-Windows C, there are quite a few libraries for Unicode.
Last edited on Dec 23, 2015 at 8:45pm Dec 23, 2015 at 8:45pm UTC
Dec 23, 2015 at 9:17pm Dec 23, 2015 at 9:17pm UTC
Thank you very much. I got it working.
Dec 26, 2015 at 11:38am Dec 26, 2015 at 11:38am UTC
Very strange... I write c++ in cocos2d-x. When i test the code mentioned above, it works in Samsung Galaxy S4, Sony Experia M4 Aqua. But fails in Samsung Galaxy S3 and Samsung Tablet SM-T113 so far...