Hy.
I have a problem to change UTF8 chars like Á É Ó É Í Ü Ú Ñ tolower. Tolower doesnt work, i think because he change only ASCII chars. I try to comparing char with hexadecimal code of Á and if is = then change him to á but it doesnt work. I know that UTF8 special chars is coding with 2 bytes. And the first of the chars i want to change is 0xC1.
int main () {
unsigned char word;
cin >> word;
switch (word)
{
case 0x81 : word = 0xA1;
case 0xC1 : word = 0xC1;
case 0x89 : word = 0xA9;
case 0x8D : word = 0xAD;
case 0x93 : word = 0xB3;
case 0x9A : word = 0xBA;
case 0x9C : word = 0xBC;
case 0x91 : word = 0xB1;
default : word = tolower(word);
}
cout << word;
system("PAUSE");
return 0;
}
tolower works, but the other doesnt work. Somebody can help me please.
These are not "UTF8 chars", these are just characters that aren't part of ASCII character set. UTF-8 is one of the many ways to encode those characters in a computer.
Tolower doesnt work, i think because he change only ASCII chars
The first one works if your characters are stored in wide (wchar_t) or narrow (char) characters, the second one only works for narrow (char) form, the third one only works for wide characters.
I know that UTF8 special chars is coding with 2 bytes. And the first of the chars i want to change is 0xC1.
This is the main source of confusion, I feel. 0xC1 is the value of 'Á' in ISO8859-1, which is a single-byte character set. You can use your old tolower() just fine:
1 2 3 4 5 6 7 8 9 10 11 12 13
#include <iostream>
#include <clocale>
#include <cctype>
int main()
{
std::setlocale(LC_ALL, "en_US.iso88591"); // only now Á is 0xc1
unsignedchar big = 0xc1;
unsignedchar small = std::tolower(big);
std::cout << std::hex << "character code was "
<< +big << " became " << +small << '\n';
}
Now, in UTF-8, the character Á is indeed two bytes, but those bytes are 0xC3 0x81. In order to tolower() that, you will have to first convert it to a wide character representation (stored in a variable of type wchar_t and has the value 0x00c1) and then use tolower() or towlower().
Sorry for my contradictory post. In my program, tolower works.
I think with wchar will work. i have to use that special caracters from UTF-8 where is encode with two bytes.
Thank you for the answers, i think my problem is solved.
Hy. I have to read a string and change his uppercase chars to lowercase. I read one by one tha caracters from that string, but how can i change only the second byte from the special chars ? if is Á = xc3 x81 to á = xc3 xA1.
I want to implement a function to do the changes, and i have to use UTF-8 encode.
Andy
I am using Ubuntu.The system("PAUSE") from the first post is because i try to do that on windows, but i have to do it on Ubuntu.
I use c++. I compile with gpp.