Convert chars to utf-8 hex strings

Sep 27, 2013 at 5:52pm
Hi, mates!

I am trying to convert some chars to UTF-8 strings...

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
std::string gethex(char c)
{

/* EXAMPLE
    if (c == 'é')
    return "%c3%a9";
    
etc...

 I need a function that converts chars like "á, é, í, ã" to UTF-8 hexadecimal strings...

*/
}

std::string encode(std::string str)
{
static std::string unreserved = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~";
std::string r;

    for (int i = 0; i < str.length(); i++ )
    {
      char c = str.at(i);
      if (unreserved.find(c) != -1)
        r+=c;
        else
        r+=gethex(c);
    }

return r;
}


http://www.url-encode-decode.com does it. Choose UTF-8, type some character and click 'Url Encode'.
Last edited on Sep 27, 2013 at 5:54pm
Sep 27, 2013 at 10:31pm
Do you want URL encoding or UTF-8 encoding? They're very different.
Sep 27, 2013 at 11:42pm
I need a function that converts chars like "á, é, í, ã" to UTF-8 hexadecimal strings...


The thing is... characters in a string are already encoded as something. They have to be. So you have to ask yourself whether or not the string is already UTF-8 encoded.

If it isn't... you'll have to find out what encoding it's in, and convert that to UTF-8.

Once you have a UTF-8 string, it's just a matter of looking at (and printing) the values as integers rather than as chars:

1
2
3
char example = 'a';

cout << hex << static_cast<int>(example);  // prints '61' 
Sep 28, 2013 at 4:05pm
I want something like below:

1
2
3
4
5
6
7
if (c == 'é')
    return "%c3%a9";

if (c == 'á')
   return "%c3%a1";

etc
Sep 28, 2013 at 5:55pm
Yes, but 'c' in this case is just going to be an integer. All characters are represented by the computer as an integer.

The char data type is the same as the int data type, only smaller in size. The character it contains is really the integral ID of a character.

So this:

1
2
3
char c = 'a';

if(c == 0x61)  // <- this will be true, because 'a'==0x61 


So if all you want is to print the character as an integer... then that is the code I already posted:

1
2
3
char example = 'a';

cout << hex << static_cast<int>(example);  // prints '61' 


But the real question here how is your 'c' encoded? Is it UTF-8 or is it some other encoding?

There is no way to solve this problem unless you know what kind of characters you're dealing with. In the end you just have a bunch of numbers, and in order to do this properly you need to know what those numbers represent.


So where are you getting 'c' from? A file? The user?
Sep 28, 2013 at 7:03pm
It is UTF-8 (hex).

In Javascript, it would be: http://pastebin.com/PaRgqfej

Here is a table: http://www.utf8-chartable.de/

Here is a sample: http://www.url-encode-decode.com/

Thanks in advance.
Sep 28, 2013 at 8:04pm
Something like this?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <string>
#include <sstream>
#include <iostream>

std::string hex( unsigned int c )
{
    std::ostringstream stm ;
    stm << '%' << std::hex << std::uppercase << c ;
    return stm.str() ;
}

std::string url_encode( const std::string& str )
{
    static const std::string unreserved = "0123456789"
                                            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                                            "abcdefghijklmnopqrstuvwxyz"
                                            "-_.~" ;
    std::string result ;

    for( unsigned char c : str )
    {
        if( unreserved.find(c) != std::string::npos ) result += c ;
        else result += hex(c) ;
    }

    return result ;
}

int main()
{
    std::string test = u8"Hello World! á, é, í, ã" ;
    std::cout << test << '\n'
               << url_encode(test) << '\n' ;
}

http://ideone.com/ssgW1h
Sep 28, 2013 at 8:24pm
It gives a hex, but the result is not UTF-8 hex.

In UTF-8 encoding, "á" is "%c3%a1", not "%FFFFFFE1".
Sep 28, 2013 at 8:37pm
> In UTF-8 encoding, "á" is "%c3%a1"

It does give %C3%A1 See the output generated here: http://ideone.com/kmeE7O

To get lower case characters, change line 8
// stm << '%' << std::hex << std::uppercase << c ;
stm << '%' << std::hex << std::nouppercase << c ;


> not "%FFFFFFE1".

Treat each byte in the utf-8 encoded string as an unsigned char;
the default char may be a signed integral type.
for( unsigned char c : str ) { /* ... */ }
Sep 28, 2013 at 9:08pm
Here is my code:

main.cpp : http://pastebin.com/DA2g16LW

encode.h : http://pastebin.com/1xp6eBpS

It does compile. The problem is that the encoding does not work.

For example, you can run the project, press CTRL+SPACE, type 0, press Enter, type 2, press Enter, open notepad, type something and press F4.

SFML is needed.
Sep 28, 2013 at 9:39pm
> Here is my code:
> SFML is needed.

1. Write a simple text based (write to stdout) program to test your encode.h - something similar to the snippet I had posted.

2. If it does not work, post the (strictly non-SFML) code here, and we can have a look at it.
Sep 28, 2013 at 10:02pm
http://ideone.com/6r96IU

&q= is the important part.

It works fine there (Ideone).

But, when I compile using GCC, the result is:
%e1%e9%ed%f3%fa
Sep 28, 2013 at 10:13pm
Are you using an IDE like CodeBlocks?
If so, save your source file(s) with UTF-8 encoding. Menu => Edit => File Encoding => UTF-8

if not, just use notepad: Menu => File => Save As => Encoding: UTF-8
Last edited on Sep 28, 2013 at 11:00pm
Sep 28, 2013 at 11:06pm
Now it works... But only when I type the string direct in the cpp file.

If I get the clipboard text (via codes) and attempt to translate it, the conversion does not work well (%e1%e9 ...).
Last edited on Sep 28, 2013 at 11:15pm
Sep 29, 2013 at 4:54am
> Now it works... But only when I type the string direct in the cpp file.

It works when the text in question is UTF-encoded.


> If I get the clipboard text (via codes) and attempt to translate it, the conversion does not work well

It does not work when the text in question is not UTF-encoded.
Topic archived. No new replies allowed.