This is the output from a simple program I wrote to dump memory. The purpose of the program is to help me understand what C++ is doing for (to) me with respect to arrays, pointers, etc.
if int x=0X7172737A
using printf(X,..) x is 7172737A, &x is 26FC50 : printf working as expected
using printf(c X,..) x is z, &x is 26FC50 : I understand that if I treat an
int as a char the high order bits
are ignored
if char c[]={ABCDefgh12345}
using printf(X,..) c[0] is 41, c is 26FC38 : as expected
using printf(c X,..) c[0] is A, &c is 26FC38 : as expected
using putchar(c[0]) c is A : as expected
using putchar(c[i]), printf(X,&c[i]), for i = 0 thru 13, obtain:
A 26FC38
B 26FC39
C 26FC3A
D 26FC3B
e 26FC3C
f 26FC3D
g 26FC3E all as expected
h 26FC3F
1 26FC40
2 26FC41
3 26FC42
4 26FC43
5 26FC44
26FC45
Note that at address 26FC38 the 4 bytes containing the first 4 elements in the array c[], the elements are REVERSED. As are the next 4 bytes and the next 2 bytes.
The print of c[i] worked exactly as expected but if you look in the dump at the element at 26FC38 it is D not A!
I am very confused as to how C++ assigns memory to array elements and how C++ recovers the elements when addressed properly. What am I missing here?
Sorry the HEXD is so hard to read. Where I am keying it is perfectly aligned, but the posting skews the alignment.
I could barely understand anything in the first part of your post. In fact, the hex dump was easier to understand.
(Using [tt]:)
026FC2C 6A6B6C6D CCCCCCCC CCCCCCCC 44434241 jklm........DCBA 0026FC3C
026FC3C 68676665 34333231 CCCC0035 CCCCCCCC hgfe4321...5.... 0026FC4C
026FC4C CCCCCCCC 7172737A CCCCCCCC A3BAC2B5 ....qrsz........ 0026FC5C
026FC5C 0026FCAC 00072918 00000001 000F19C0 .&....)......... 0026FC6C
026FC6C 00281D60 A3BAC245 00000000 00000000 .(.`...E........ 0026FC7C
026FC7C 7FFDF000 017D7840 00000000 00000000 .....}x@........ 0026FC8C
026FC8C 00270000 00000000 0026FC70 000000D4 .'.......&.p.... 0026FC9C
026FC9C 0026FCF0 0007109B A39BB0F9 00000000 .&.............. 0026FCAC
026FCAC 0026FCB4 0007275F 0026FCC0 76294911 .&....'_.&..v)I. 0026FCBC
026FCBC 7FFDF000 0026FD00 7751E4B6 7FFDF000 .....&..wQ...... 0026FCCC
I assume you dumped that memory yourself by copying chars into 32-bit integers. If that's the case, then you just saw what endianness is all about.
Suppose we have the number 0x12345678 (305,419,896(10)), depending on the CPU, this number will be stored differently in memory:
x86 CPUs store it as [lower memory ... 78 56 34 12 ... higher memory]. This is known little endian, because the little (least significant) end of the number comes first. This seems counter-intuitive only at first.
The now defunct PowerPC would have stored it as [lower memory ... 12 34 56 78 ... higher memory]. This is known as big endian.
There's another, weirder endianness called middle endian which would be [lower memory ... 34 12 78 56 ... higher memory].
"The problem of dealing with data in different representations is sometimes termed the NUXI problem. This terminology alludes to the issue that a value represented by the byte-string 'UNIX' on a big-endian system may be stored as 'NUXI' on a PDP-11 middle-endian system; UNIX was one of the first systems to allow the same code to run on, and transfer data between, platforms with different internal representations."
Helios,
Thank you very much for your reply. I admit I have never heard of "endianness". My prior experience at reading hex dumps was on an IBM 360/370 (big endianness according to Wikipedia). Lord knows I had the opportunity to read many of them.
I can't tell whether printf is restructuring each 4 bytes to "correct" for little endianness or not.
Sorry my "code" was so obscure. What I was attempting to show was that I created an int x=0X7172737A and printed it with printf which gave it back to me in the same format. In the hexdump this integer shows the same sequence of 7172737A. I was happy with all this. Your answer implied that I should see 7A737271 in the hexdump. Perhaps it really is in that order but printf is taking care of it for me. I'll discover the answer eventually.
But when I created a char array each 4 bytes was reversed in the hexdump which would fit with the little endianness argument. I obviously need to think about this for a while. I really appreciate your reply which provides me a starting point for more thinking.
You assigned 0x7172737A to the int. You assigned it through the language. You didn't set each byte manually. In reality, the memory contents of the integer were [7A 73 72 71].
But, when you read back into an integer this data, it was translated back into it's original value, which was 0x7172737A, and then you printed this value (printf() of course uses a representation that humans can understand).
Let me repeat this, so it's perfectly clear:
1. You have the value (since integers are a single unit instead and not individual bytes, how it actually looks is irrelevant at this point) 0x7172737A in a 32-bit integer x.
2. If you did this: char *v=(char *)&x;
you'd find that v[0]==0x7A, v[1]==0x73, v[2]==0x72, v[3]==0x71 in your current architecture. This is little endian.
3. Like I said, integers are a single unit. Regardless of the endianness, the CPU is consistent on how it reads and writes integers from and to memory. So, if you did this: int *px=&x;
*px==x, because the CPU is reading and writing integers with a consistent endianness.
Single bytes are of course not affected by endianness, so casting the pointer to a char * (or, even better, an unsigned char *) will always give you the actual memory contents.
The lesson here is:
* If you're going to dump memory, dump it one byte at a time, otherwise you'll likely have endianness problems. Since bytes don't have endianness, the same dumping method will work anywhere.
* If you really want to dump using multi-byte types, you have to perform some corrections on the value:
If the machine is little endian, you'll have to reverse the bytes in the integer in order to get an accurate dump: [12 34] would be read as 0x3412. This may be what you need, or not. Most likely not, so you need to reverse the bytes in the integer. The same goes for [12 34 56 78], which will be read as 0x78563412.
If the machine is big endian, you're in luck. You can just print the integer confident that those are the actual contents.
No more big endian PCs are being manufactured since Apple switched to Intel, by the way.