Compiler optimization question

Can anyone help me explain why one a compiler might perform the following optimization. Assuming all variables are declared and initialized appropriately.

1
2
3
4
//Before
for (j=0; j< 8192; j++)
   for (i = 0; i < 200000; i++)
      x[i][j] = x[i][j] + 1;


1
2
3
4
//After
for (i=0; i < 2000000; i++)
   for (j=0; j< 8192; j++)
      x[i][j] = x[i][j] + 1;


Please be specific if you can, I'm not sure why it does this.
IF it did it, then it is smart enough to realize that the [j]s are sequential in memory (fewer page faults, etc). doing it the other way, you skip 2million *sizeofthing bytes every iteration, causing you to need a different page of memory each iteration... ouch!
if the dimensions of i were much smaller, like 100, ... would it do it? I dunno ... at some point its small enough hops not to matter.

these loops can be condensed to 1d if you just increment everything. depends on the full dimensions of x, not just the touched dimensions?
Last edited on
> these loops can be condensed to 1d if you just increment

If x is a two-dimensional array, and the second dimension is known,
the optimiser knows how to condense it (and use SIMD instructions).

https://gcc.godbolt.org/z/rK84ca7sb
@GroovyJack

You seem to really be struggling with some basic stuff here. I realize it is new and non-obvious, but I have to wonder: did you really not get any good notes for your course? Surely the curriculum has listed reading materials for you.

If you would just read through them once or twice you could skip a lot of these really basic questions.


I’m not trying to be unkind here. If you want to develop software, one of your most important skills will be the ability to be given a hefty manual and access to the code and familiarize yourself with them in just a few days.


[edit]
And by all means, ask questions. Just show us that you’ve put some effort into understanding them before asking. Right now you are just using us like a software version of Cliff Notes.
Last edited on
Try it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <ctime>
using namespace std;

int main()
{
   static int x[20000][8192]{};
   int i, j;

   clock_t t1 = clock();

   for (j=0; j< 8192; j++)
      for (i = 0; i < 20000; i++)
         x[i][j] = x[i][j] + 1;

   clock_t t2 = clock();

   for (i=0; i < 20000; i++)
      for (j=0; j< 8192; j++)
         x[i][j] = x[i][j] + 1;

   clock_t t3 = clock();
   
   cout << "First : " << double( t2 - t1 ) / CLOCKS_PER_SEC << " s\n";
   cout << "Second: " << double( t3 - t2 ) / CLOCKS_PER_SEC << " s\n";
}


First : 3.713 s
Second: 0.271 s



But the compiler must be brighter still, because if you add -O3 optimisation it gives
First : 0.202 s
Second: 0.056 s

Last edited on
and the 1d runs about 5% faster on my computer in O3

1
2
3
4
5
	int * ip = (int*)x;
   do
   {
	    (*ip)++; ip++;   
   } while(ip < (int*)x+20000*8192);  


and interestingly as a benchmark, the above is only a small amount (not even %s) slower than a benchmark
static char x2[20000][8192]{};
--
memset(x2,1,20000*8192);
--
which does the same thing if they are zero initialized as in the example.
Last edited on
one of your most important skills will be the ability to be given a hefty manual and access to the code and familiarize yourself with them in just a few days


I wish! IMO the days of having manual(s) that document the system/languages etc are long gone. These days it's all on-line. I well remember back in the 1980's having shelves full of manuals for Data General, DEC, Systime etc. Everything you ever wanted to know about these systems was in these manuals - somewhere. The original DOS came with manuals that even contained bios code listings. IMO sadly these are no longer supplied. If you're lucky you get a 'leaflet' explaining how to plug it in (or whatever) and links to on-line info - which often is neither detailed or complete.
it varies. My complaint is most docs throw the house on you.
me: I want to use your library to compute A*B
their example: get the prime factorization of A and B, load it into ourweirdcontainer, call prep for multiply ... and 2 pages of code later ... they just cannot seem to give simple get started level examples in most systems.
Topic archived. No new replies allowed.