conversion between multi-byte and wchar_

Forum

Forum
UNIX/Linux Programming
conversion between multi-byte and wchar_

conversion between multi-byte and wchar_t, mbsrtowcs always fail

Aug 3, 2009 at 2:21pm

Greetings, everyone. In my program, I read an input file encoded in utf-8 and I want to convert the data into std::wstring. My working platform is 32-bit Linux, which means wchar_t is commonly 32-bit. My idea is to use mbsrtwcs to convert the multi-byte into wchar_t. However, mbsrtwcs always failed whenever invoked.

Here is the code. In the code, I just convert only one Chinese character. The compilation works fine but the program complains "Invalid or incomplete multi-byte or wide character" when mbsrtowcs is invoked. I am not quite sure whether mbsrtowcs is a correct choice. Any one could help me? Thanks in advance.

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
        wchar_t wbuf[BUFSIZ];
        size_t  nconv = 0;
        char    buf[BUFSIZ];
        const char *p = buf;
        
        // buf[0]~buf[3] represent Chinese '单' in unicode
        buf[0] = -27;    //0xE5
        buf[1] = -115;   //0x8D
        buf[2] = -107;   //0x95
        buf[3] = 0;
        
        nconv = mbsrtowcs(wbuf, &p, 1, NULL);
        if(nconv == (size_t)-1)
        {
                perror("mbsrtowcs");
                exit(1);
        }
        else
                wbuf[nconv] = L'\0';
        
	return 0;
}

Edit & run on cpp.sh

Aug 4, 2009 at 1:52am

helios (17607)

http://www.mkssoftware.com/docs/man3/mbsrtowcs.3.asp

mbsrtowcs() doesn't just convert a UTF-8 string to a wide string. It converts a multibyte string to a wide string. The format of the accepted string depends on the system locale or on a previous call to setlocale().

If you want to just convert UTF-8, I've posted a conversion function, previously.
http://www.cplusplus.com/forum/general/7142/
I also have a newer one that uses std::strings.

Last edited on Aug 4, 2009 at 1:52am

Topic archived. No new replies allowed.

C++

Forum

conversion between multi-byte and wchar_t, mbsrtowcs always fail