conversion between multi-byte and wchar_t, mbsrtowcs always fail

Greetings, everyone. In my program, I read an input file encoded in utf-8 and I want to convert the data into std::wstring. My working platform is 32-bit Linux, which means wchar_t is commonly 32-bit. My idea is to use mbsrtwcs to convert the multi-byte into wchar_t. However, mbsrtwcs always failed whenever invoked.

Here is the code. In the code, I just convert only one Chinese character. The compilation works fine but the program complains "Invalid or incomplete multi-byte or wide character" when mbsrtowcs is invoked. I am not quite sure whether mbsrtowcs is a correct choice. Any one could help me? Thanks in advance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
        wchar_t wbuf[BUFSIZ];
        size_t  nconv = 0;
        char    buf[BUFSIZ];
        const char *p = buf;
        
        // buf[0]~buf[3] represent Chinese '单' in unicode
        buf[0] = -27;    //0xE5
        buf[1] = -115;   //0x8D
        buf[2] = -107;   //0x95
        buf[3] = 0;
        
        nconv = mbsrtowcs(wbuf, &p, 1, NULL);
        if(nconv == (size_t)-1)
        {
                perror("mbsrtowcs");
                exit(1);
        }
        else
                wbuf[nconv] = L'\0';
        
	return 0;
}
http://www.mkssoftware.com/docs/man3/mbsrtowcs.3.asp

mbsrtowcs() doesn't just convert a UTF-8 string to a wide string. It converts a multibyte string to a wide string. The format of the accepted string depends on the system locale or on a previous call to setlocale().

If you want to just convert UTF-8, I've posted a conversion function, previously.
http://www.cplusplus.com/forum/general/7142/
I also have a newer one that uses std::strings.
Last edited on
Topic archived. No new replies allowed.