I'm writing a program that manipulates html files which are encoded in UTF-8, in particular they include arabic characters. With copying entire files, char type works just fine, but when I try to extract certain portions of the text, things get messy. I understand why that poses a problem and I've done some research online but I haven't been able to find a clear approach to deal with UTF-8.
I'm writing C++ using the freely available Dev-C++ IDE, and the program is a console application.