Standard convertion for UNICODE symbols

Jun 9, 2011 at 9:30am
Hi

I am pretty new to C++. I am working a program that will eventually do some type of text conversion of a UNICODE text file.

My question is this in ASCII if declare a Char as follows :
char Symbol;

Symbol = 33;
cout << Symbol << endl;

It would display ! on the screen.

Is there a way to do the same, but for unicode. Example

char Symbol;

Symbol = U+003E;
cout << Symbol << endl;

hoping to display = onto the screen.

The aim would be to eventually loop trough all 95156 unicode symbols to automatically either create a text file with a switch statement or to save them in a table to reference what conversion should happen.

Thanks for all the help in advance.
Last edited on Jun 9, 2011 at 10:54am
Jun 9, 2011 at 10:11am
wchar_t symbol = L'!';
Jun 9, 2011 at 10:17am
First of all,
a text file with my switch statement
Don't. Use a lookup table.
Second, a char is not suitable for storing Unicode characters.
Third, "U+003E" is just a fancy way of saying 62. If you simply assign 62 to an integer, it'll have the same effect. Since it's just notation, 0x3E means the same.
Finally,
The aim would be to eventually loop trough all 95156 unicode symbols to automatically
You can't create a useful lookup table using a fully automatic method. What code page are you trying to convert?
Last edited on Jun 9, 2011 at 10:17am
Jun 9, 2011 at 11:35am
Hi thanks for the replys.

kbw suggested:

wchar_t symbol = L'!';

but it seems to only yield the corresponding integer value of the character between the apostrophes.

helios :

I agree with you on the fact that what I am trying to do may not be optimized, but at the moment I am trying to get a "prove of concept". I am not sure what
What code page are you trying to convert?
your are talking about, but my end goal is to try and do a text compression on Unicode text file( new idée I think might work).

lastly I might be miss interpreting how to use the code kbw suggested, but if I enter
wchar_t symbol = 33;
and then
cout << symbol << endl;
It will only display 33. If I replace 33 with L'33' it displays 51.

Thanks again for all the replies

regards Overklog
Jun 9, 2011 at 12:15pm
text compression on Unicode text
What exactly do you need a table for, then?
What type of "conversion" are you trying to accomplish?
Last edited on Jun 9, 2011 at 12:18pm
Topic archived. No new replies allowed.