encoding of std::string

Aug 17, 2009 at 5:11pm
Hi.

I'm creating a small program that use the curl library (http://curl.haxx.se/) to interact with a php based service.In some functions i need to send post data in urlencoded format ( like http://www.w3schools.com/TAGS/ref_urlencode.asp ), and to do this, i simply convert the code of each char in the string in his hex value.
For example "abc" is converted in this string: "%61%62%63" , but when i have converted the character 'è' i have discovered that in std::string, it uses 2 bytes according to Utf format.
In another test i stored 'è' in a string but now it uses only 1 byte.
So, what kind of encoding is used?
The string internal encoding type is defined at runtime according to the system default encoding?


another less important question is why if i do:

cout << (char)138;

it print the exact character 'è', but with

cout <<'è';

it print a strange char (is converted in -24 ) ???
(sorry for english).
Thanks.

Aug 17, 2009 at 5:53pm
but when i have converted the character 'è' i have discovered that in std::string, it uses 2 bytes according to Utf format
During this test, your input was converted to UTF-8 automatically before getting to std::string.

In another test i stored 'è' in a string but now it uses only 1 byte.
During this test, your input remained in ISO-8859-1 and got pushed onto the std::string like that.

std::string doesn't have the concept of encodings. It just stores whatever is passed to it.

another less important question is why if i do:
cout << (char)138;
it print the exact character 'è', but with
cout <<'è';
it print a strange char (is converted in -24 ) ???
Something wrong with your locale or how your editor is writing source. 138 and -24 are not complements of each other.
Aug 17, 2009 at 7:14pm

During this test, your input was converted to UTF-8 automatically before getting to std::string.

During this test, your input remained in ISO-8859-1 and got pushed onto the std::string like that.


there is a way to ensure that all the input is converter every time to the same format?

Something wrong with your locale or how your editor is writing source. 138 and -24 are not complements of each other.


maybe because the ide( codeblock) save the test file in utf8 ( because there are many strange chars like $,§ ecc.. i used to test it) and gcc does not read correctly the source during compilation?
Aug 17, 2009 at 8:33pm
there is a way to ensure that all the input is converter every time to the same format?
Not that I know of. Mmh... Maybe if you knew the locale of the system.

maybe because the ide( codeblock) save the test file in utf8 ( because there are many strange chars like $,§ ecc.. i used to test it) and gcc does not read correctly the source during compilation?
Maybe something like that, but 'è' in UTF-8 is C3 A8.

The console (the Windows console, specifically) is not reliable for printing non-ASCII characters.
Topic archived. No new replies allowed.