Hello everyone, and thanks for your valuable time :-)
I would like to know how to make the most of 32/64 bits CPU architecture (if it is possible...) As I understand it, on 32 bits system you can transfer data between RAM and registers by block of 4 bytes (32 bits) during one clock. And 8 bytes (64 bits) on 64 bits system. For the exemple I will take a 32 bits system.
Let say I have an array of chars :
char myArray[100] = {...values...};
I would like to do some computations with these values :
for(int i(0); i<100; i++)
{
...computations on myArray[i]...
}
I would like my processor to use its 32 bits ability. Because 1 char = 1 byte, this looks like :
1) one clock : LOAD char[i], char[i+1], char[i+2], char[i+3] from RAM to registers
2) some clocks : do the computations on char[i], char[i+1], char[i+2], char[i+3]
3) one clock : STORE result from computation with char[i], idem with char[i+1], idem with char[i+2], idem with char[i+3] from registers to RAM
and avoid such thing :
1) one clock : LOAD char[i]
2) one clock : compute with char[i]
3) one clock : STORE result from computation with char[i]
4) one clocl : LOAD char{i+1]
5) one clock : compute char[i+1]
6) one clock : STORE result from computation with char[i+1]
7) etc...
So my question is : is there a way (code design / compiler intrinsics or hints / ...) to go with the first method ? or does the compiler do it automatically according to the target platform (32b / 64b) ?