Guarantees about the encoding of 'A'

EDIT: Please see rephrased question in next post!

Am I right that the C++ standard (C++03) does not guarantee any of the following:

1) The multi-byte character encoding of the glyph 'A' in the classic C locale has a value equal to 65.

2) The wide character encoding of the glyph 'A' in the classic C locale has a value equal to 65.

3) The multi-byte and wide character encodings of the glyph 'A' in the classic C locale has equal values.


Note: By "the glyph 'A'" I mean the Latin capital letter 'A', and not some other glyph that looks like 'A'. It is the 'A' that occurs in the basic source character set.

Note: The multi-byte character encoding of 'A' is guaranteed by the C++ standard to use a single byte, so we can talk meaningfully about the value of the encoding.
Last edited on
I think I stated my question in an overcomplicated fashion. Let me rephrase:


Am I right that:

1) The standard allows for the value of 'A' to differ from 65 in the multi-byte character encoding of the C locale.

2) The standard allows for the value of 'A' to differ from 65 in the wide character encoding of the C locale.

3) The standard allows for the value of 'A' to differ between the multi-byte and the wide character encodings of the C locale.


Note: By "the standard" I mean the 2003 revision of the C++ standard (C++03).

Note: By "the glyph 'A'" I mean the Latin capital letter 'A', and not some other glyph that looks like 'A'. It is the 'A' that occurs in the basic source character set.

Note: The multi-byte character encoding of 'A' is guaranteed by the C++ standard to use a single byte, so we can talk meaningfully about the value of the encoding.
Last edited on
why is that important?
It is important to me, because I'm trying to understand what the standard requires with respect to the encodings used in the C locale. So, I thought that an answer to a specific question like this, could help me towards this goal.
In particular I'd like to know whether the following program always writes out a single byte with value 65 when served a single byte with value 65 as input.

1
2
3
4
5
6
7
int main()
{
  char c;
  cin >> c;
  wchar_t c2 = c;
  wcout << c2;
}


It certainly does that on my system, but does it on any implementation that complies with C++03?
Last edited on
The answer to all your questions in posts 2 and the last is, "yes."

See the section on Character Sets under Lexical Conventions. The mapping of 'A' to a numeric value is implementation defined. (Hence, C++ works on EBCDIC systems.) However, the implementation must give you some help there. Read the caveats.

What this means is that you should not assume anything about the value of the letter 'A'. It is a regular integer type so you can easily get its actual value.

    int x = 'A';

The last post is special, because it does not address your concerns. If the input has a value of 65, then c must have a value of 65.

Hope this helps.
Topic archived. No new replies allowed.