Hi, I trying to read one by one character from Unicode (utf-8) file, but I don't know how to read just one character.
So can you tell me what is the easiest way to read a single character?
If your *platform* has UTF-8 support, there is nothing special to do, just open you UTF-8 file as a wide character stream:
1 2 3 4 5 6 7 8 9 10
#include <fstream>
#include <iostream>
#include <locale>
int main()
{
std::locale::global(std::locale("")); // activate user-preferred locale, in my case en_US.utf8
std::wifstream wf("test.txt"); // test.txt contains utf-8 text
for(wchar_t c; wf.get(c); )
std::wcout << "Processed character " << c << '\n';
}
tested with GNU gcc 4.6.2 on linux
If your compiler has sufficient C++11 support, you can use the locale-independent Unicode facilities, (not yet supported by GCC, but supported by Clang and Visual Studio)
(tested with clang++3.0 on linux and visual studio 2010 sp1 on windows 7)
Both examples tested on a file that contains the bytes 7a c3 9f e6 b0 b4 f0 9d 84 8b, which represent UTF-8 encoding of the four characters zĂć°´đ (Windows's version of codecvt_utf8, as usual, fails miserably at the đ which is not representable as UCS2)