I can not use fgetc to read special characters like swedish å ä ö. I would like to read character by character using fgetc and then present the decimal ascii-result in a consol window and then write the result to a file called ascii.txt.
So my question is: How do i specify the read-encoding for special characters like swedish å ä ö - or in better way the read-default character for the system runned in. I guess i should use UTF7/UTF8, or encoding "default" (unicode) - but how can i specify that in using fgetc?
I´ve tried to google this and i haven´t found the answer. Would be really happy if you could help me.
I´m using a very simple ansi .txt-file for an example. My whatever.txt contains only aåäöb.
The result is "97 -27 -28 -10 98" written to console and to a.txt-file.
Desired result should be "97 134 132 148 98". General Extended Ascii.
Yes i´m very new in programming c/c++. I would appreciate that you explain in that way to.
Plain old char only allows the storage of integer values [0,127] (observe that all extended ascii values are greater than 127!). unsignedchar allows the storage of integer values [0,255].
EDIT:
This actually isn't the problem. I tried your code (first with using plain char), and in my debugger in VS 2005, the second character "text" showed up correct (å), albeit with the integer value -27. I then ran it using unsigned char (casting "text" in the first if statement and in the while condition to char), and the second character "text" showed up correct again (å), with the integer value 229 (NOT 134). The encoding of the text file is to blame. I'm not that good with encoding, but I guess you'll have to change the encoding of the text file somehow for the values to show up as expected.
Well - thank you for your effort in trying to solve this, but it´s not the solution.
So you mean that c++ isn´t capable to encode ANSI MIME type "text/plain" ???
Anybody else who know how to convert my txt-file inte a proper format to encode?
Thank you for your help for a higher understanding in character encoding. I was hoping that it was a method to always get the right ascii-decimal (country and character-set independant), but i now realize that´s a lot more complicated then that.
The underlying problem with my question is now solved.
My primary form-application that will ask you to choose a txt-file from an openfiledialog will use UTF7-encoding for the streamreader routine and default(unicode) or UTF8-encoding for the streamwriter routine to present a copy of the txt-file in use.
(I will use this copy later on to make a replace of the desired txt-file to be scrambled).
In this case i don´t need the right ascii-decimal (as i thought) to achieve the desired result due to a "decryption" right back from unicode decimal codes created by a wchar_t declaration of the character-by-character read by fgetc to desired character.
When i do an unicode-integer to array conversion i get the right character-array as the original-file in use.
This code works perfectly.. but i don´t know how it works with additional character-sets and tables. I will try it on the cyrillic, greek and chinese- tables as soon as possible.
I guess i have to learn a lot more about this. I will return with an answer to this.