[std::string] issue with special characters

Hello!

I'm working on a program that must parse the natural language words from a file. Special characters like 'ñ','ç', or accented vocals ('á','ä',...) must be recognized as a character contained in words and not as delimiters.

I'm using fstream and string, since string::find_first_of and string::find_first_not_of methods worked great due to those special characters not having a clear representation.

For instance, I have a constant string containing all the caracters that can be part of a word, that includes a lot of special and accented characters, and I use that string combined with find_first_not_of to delimit words. Let's say my string is
string esAcceptedChars = "1234567890abcdefghijklmnñopqrstuvwxyzçáéíóúàèìòùäëïöüâêîôûABCDEFGHIJKLMNÑOPQRSTUVWXYZÇÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÄËÏÖÜ";

I performed some tests and it worked under unix, but when I tested it under windows, I found that those special characters were not properly detected. So, I debugged my program by comparing every character readed from disk with the characters in the constant string, and I found out that they had a different representation if they were readed from disk rather than specified as constants in the source code.

For example, the string with value equal to "qué" readed from disk, is different from the string declared as string str="qué".

This is really delaying my work, and I'm really clueless at this point. I've been exploring another options like wstring, but since the problem is that I can't effectively compare a special caracter constantly declared by me with one readed from disk, I guess that wouldn't be very useful.

So I'm here asking for help of the experts, and I really hope that you could excuse me for my terrible english and, hopefully, provide some help.

Thanks in advance.

Last edited on
Check the character encoding of your source file and of your input file
Hi Bazzy, thanks for you reply.
I really appreciated your help, but I've never done something like checking the character encoding of a file in c++ and I really don't know if this would help because even having such information, I don't know how to modify my program to behave one way or another depending on the encoding.

If you could give me some more hints on this it would be very helpful :)
What Bazzy said.

This is why you don't use non-ASCII characters in source code. It's non-portable.
To represent the string "qué" use instead "qu\xE9". E9 is the Unicode code point for the character.
"áéíóú" -> "\xE1\xE9\xED\xF3\xFA"
http://en.wikipedia.org/wiki/ISO-8859-1
Excuse me for the inappropiate language, but it f****ng worked.

You guys have saved me a lot of time and you have my gratitude, and made me realize I still have a lot to learn :)
Topic archived. No new replies allowed.