Encode Char to UTF8

Jul 19, 2012 at 9:29am

Hi!. My question is simple. I need to encode a char array like "áéí" in UTF8. For example 'á' must be converted to "Ã¡"

Thanks in advance!

Jul 19, 2012 at 10:09am

Peter87 (11251)

It's not clear what you mean. You want to change some text in some other encoding to UTF8?

Jul 19, 2012 at 10:50am

viliml (791)

If you use c++11, char is by default UTF8

Jul 19, 2012 at 10:57am

Peter87 (11251)

@viliml
That's not true.

Jul 19, 2012 at 11:03am

Javi OD (4)

I have a char with accents. I want to encode this char to utf8.

List characters - utf8

á => Ã¡
À => Ã€
ä => Ã¤
é => Ã©
è => Ã¨
É => Ã‰
ê => Ãª
æ => Ã¦
í => Ã*
ó => Ã³
Ó => Ã“
ö => Ã¶
ú => Ãº
ü => Ã¼
ñ => Ã±
Ñ => Ã‘
ç => Ã§

Jul 19, 2012 at 11:30am

Peter87 (11251)

I think what wiliml meant was that in C++11 you can write UTF8 encoded strings as u8"I'm a UTF-8 string." This is done at compile time so it will not help you if you want to encode the strings at runtime.

If you want to convert the string to UTF8 (at runtime) you first have to know what encoding the original string is using.

Jul 19, 2012 at 11:50am

viliml (791)

Wikipedia wrote:
For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be both at least the size necessary to store an eight-bit coding of UTF-8 and large enough to contain any member of the compiler's basic execution character set. It was previously defined as only the latter.

And:

Wikipedia wrote:

1
2
3

u8"I'm a UTF-8 string."
u"This is a UTF-16 string."
U"This is a UTF-32 string."

The type of the first string is the usual const char[]. The type of the second string is const char16_t[]. The type of the third string is const char32_t[].

As you see, the regular type char has UTF8 encoding

Last edited on Jul 19, 2012 at 11:53am

Jul 19, 2012 at 1:22pm

Peter87 (11251)

viliml wrote:
As you see, the regular type char has UTF8 encoding

It doesn't say that. It just says you can use char to store UTF8, but you can use char for other encodings as well.

Last edited on Jul 19, 2012 at 1:23pm

Jul 19, 2012 at 1:48pm

viliml (791)

And how do you choose which encoding to use?

Jul 19, 2012 at 3:59pm

Javi OD (4)

I need encode a simple char in utf8

int main() {

char test[10] = "Hellóóóó";

//Here encode test in utf8

return 0;

}

Jul 19, 2012 at 5:37pm

Cubbi (4774)

char test[10] = "Hellóóóó";
//Here encode test in utf8

On the majority of platforms, it is already in UTF-8, see ideone.com's linux for example: http://ideone.com/iSKK2
In other cases, there are plenty of platform-specific means to do that conversion, or, in C++11, standard means as well. But, as already pointed out, adding a u8 before the opening " enforces that on all platforms/environments, if you have C++11 support.

If you don't, then try to give as much detail as possible about your platform and compiler.

Last edited on Jul 19, 2012 at 5:40pm

Jul 23, 2012 at 7:50am

Javi OD (4)

I'm using vs 2008 win32. You need more info?

Thanks in advance

Topic archived. No new replies allowed.

C++

Forum

Encode Char to UTF8