Interesting question. Not sure. UTF-8?? I think Visual Studio knows when a Unicode character is present in a source file and then asks the user to switch the source file encoding.
But I guess that'll be just for saving the source file. For the actual program execution, I really don't know. Will the compiler respect the UTF8 nature of the source file and create the extra bytes for the character in order to encode the same in RAM? I recommend that you do a small test and find out.
I believe all that option does is enable windows function and typedef to use either char or wchar_t. _T("some text"). I forget what the keyword windows uses, it's either TCHAR or something. As well as for functions. for example MessageBox(), depending on the setting it's either MessageBoxA(), MessageBoxW() or undefined.
That's a different story, xerzi. The macro name is UNICODE and it is defined through the project properties, never in code in the case of Visual Studio.
The UNICODE setting will never change the meaning of a char. It will change TCHAR's, but never char's. Since the question uses char and not TCHAR, I assume the OP really wants to know if the read-only string will be encoded in a particular encoding. I suggested that he/she tries it out. It is simple: Create a new project that declares that string, then examine each byte pointed to by 'c'. Is it one byte? Is it two or three bytes? Ideally, he/she should find out beforehand the actual representation of the sterling pound symbol in UTF8 and UTF16 at least to have some comparison criteria.
Which is why that option does nothing, if you want to use wide characters than you need to define it:
1 2 3
// may need to include a header
constwchar_t* c = L"£";
Depending on the compiler, wchar_t can be of different size, windows uses 16-bits. You might be better off finding a library which supports different encodings, as it differs from compiler to compiler.
I understand, but I don't think that's the point. I think the point here is: What will the final representation in RAM be? Or at least that is what I think the point is. I could be wrong and you could end up being right.
I am in visual studio 2005 windows xp environment.
In fact in my situation, I have api giving char* pointer also this api have not given any information about the encoding of this char*.
While debugging the debugger view shows the character "£" correct but the char[0] showing -65 and rest as 0 as I am giving a char[51] buffer to it.
Second I am passing this value to another api which is taking char* also this api have not mentioned anything about it's encoding.
For testing purpose I have directly passed "£" string to the second API.
I am not getting how to deal with situation like this even I am not been able to understand and conclude something for this behavior except this---that....
Also possibility would be many But I have this current situation and I cannot go for wchar_t *.
Simply making it a wide char won't help, necessarily.
It comes down to 2 things:
1) How is the IDE saving the file?
2) How is the compiler interpretting the file?
Ideally, in this case, the answer to both questions would be "UTF-8" but that's not really an assumption you can make.
It's likely that both IDE and compiler options are configurable. But I'm too lazy to check how to do that right now.
The easiest way to test this is to do something like this:
1 2 3 4 5 6 7 8 9 10 11
constchar* c = "£"; // £ sign is U+A3, which in UTF-8 is stored as 0xC2 0xA3
// so an easy way to check to see if it's really UTF-8:
if( c[0] == 0xC2 && c[1] == 0xA3 )
{
// yes, it's UTF-8
}
else
{
// no, it's some other encoding
}
This would probably be better done with an assert or something.
If you want to ensure that it's UTF-8 all around, the safe bet is this:
Well, it doesn't appear to be Windows 1252 (a superset of ISO 8859-1) because the pound sign is A3. What's your default non-Unicode charset in Control Panel? Maybe that's the one used by the compiler?
The setting is in Region and Language, Administrative tab, but it seem that the setting has changed now. You now select a locale and I guess that selects the charset to use. See if the actual value stored in memory is altered when you change this setting.