Hexadecimal to UTF-8

May 9, 2012 at 9:43am
Hello,

I have a char with the value C5. I know that C5 in hex corresponds the value 197 in decimal. This number (197) corresponds to a character in ASCII extended. I need to show (printf) or convert C5 in it's corresponding UTF-8 character.

By the way I am doing it in C.

Can anyone help me out?
May 9, 2012 at 11:56am
To use printf in C to show the extended character, you could do it like this..

1
2
char star = '\xC5'; // Just to show a char with the C5 value 
	printf("%c \n\n",star); // prints the symbol 
May 9, 2012 at 12:45pm
Are you trying to print U+00C5, Å? The proper way to do this is to use wide character I/O.

In C,

1
2
3
4
5
6
7
8
#include <wchar.h>
#include <locale.h>
int main()
{
    setlocale(LC_ALL, "");
    wchar_t c = L'\u00c5'; // or = L'\xc5';
    wprintf(L"%lc\n", c);
}


online demo: http://ideone.com/tpM2a

Note that this does not convert it to UTF-8. To convert to UTF-8, in C, use wide to multibyte conversion, if you are using a platform that supports UTF-8 (e.g. Linux, but not Windows)

1
2
3
4
5
6
7
8
9
10
11
12
#include <stdlib.h>
#include <locale.h>
#include <stdio.h>
int main()
{
    setlocale(LC_ALL, "en_US.utf8"); // or any other .utf8 locale
    wchar_t c = L'\u00c5'; // or = L'\xc5';
    char mb[MB_CUR_MAX + 1];
    int len = wctomb(mb, c);
    mb[len] = '\0';
    printf("UTF-8 char: %s\n", mb);
}


online demo: http://ideone.com/BoENC
Last edited on May 9, 2012 at 12:58pm
May 9, 2012 at 1:38pm
Thank you all.

Cubbi: I am sorry for my ignorance but I do not fully understand the need to build wchar_t c and also how to build it for other hex expressions.

Thanks a lot so far.
May 9, 2012 at 2:38pm
197 is not a valid value for a char on most systems (since on most systems, CHAR_MAX == 127), while wchar_t is the type capable of holding any character, including yours.

Give an example of "other hex expression"
May 9, 2012 at 4:03pm
Say for example A1
May 9, 2012 at 4:09pm
Just could replace "c5" with "a1" in the examples above: http://ideone.com/7KEOx (printing as-is) and http://ideone.com/rHeAV (converting to UTF-8)
May 11, 2012 at 12:10am
Thanks a lot
May 11, 2012 at 8:33am
If I could get some extra help.
Let's say I have the following string "DDD %C5 ir".
I want to print it, but replace %C5 by the UTF8 corresponding character. The % is just to identify the point of UTF8/hex
May 11, 2012 at 1:45pm
Assuming you can do string processing in C (which is somewhat tedious, compared to C++), it shouldn't be any more difficult than the samples above, but I am not entirely sure you're stating your goal with sufficient detail:

Do you want to print that string, showing the character Å (U+00C5) instead of the three-character fragment "%C5", or do you want to create a new string that holds the two-byte UTF-8 representation of that character ("\xc3\x85") instead of the three-character fragment "%C5", and which would, therefore, display Å when printed on a UTF-8 enabled terminal?
May 11, 2012 at 4:01pm
Hello,
Thank you for coming back.
I want to print that string, showing the character Å (U+00C5) instead of the three-character fragment "%C5".
Can you help, pls
May 11, 2012 at 4:24pm
Something like this could work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <wchar.h>
#include <locale.h>
#include <stdlib.h>

void show(char* str)
{
    while(str && *str)
        if(*str != '%')
            putwchar(btowc(*str++));
        else
            putwchar(strtoul(str+1, &str, 16));
}

int main()
{
    setlocale(LC_ALL, "");
    char str[] = "DDD %C5 ir";
    show(str);
} 

online demo: http://ideone.com/jVGsW
Last edited on May 11, 2012 at 4:32pm
May 11, 2012 at 11:16pm
Thank you very much. It did work.
All the best
Topic archived. No new replies allowed.