Counting Backwards

Here is an interesting observation I have made. I am optimizing some code at the moment, and even 10 microseconds is important. The application will be run 1mil-10mil times so every ms saved per run adds up very quickly.

I found an article about optimizing C that said computers are better at counting down towards 0, then up towards an arbitrary number.

So I put this to the test. I have a double-loop, and inside that are 2 more double loops. Giving me on the order of 50mil-100mil iterations per run. I changed the loops from incrementing to decrementing. It saved up-to 100milliseconds per run.

Interesting eh :)

(100ms x 1mil Iterations) / 1000 = 100,000 Seconds Saved possibly
E.g. 27 hours.
That is interesting. I wonder why though.
There is also the point, that comparing something to 0 is faster. So when you go in reverse it compares against 0 each time, instead of a variable.

*shrug* I am getting tiny speed improvements from it. Enough to warrant it on a couple of my really complex loops.
It is because CPUs implement the instructions jge, ... for "Jump if Greater or Equal *to zero*", so in a loop, each cycle has to substract the counter from the "target value" if you don't test for zero. However, this should be optimized away if your compiler is worth it's money (...as a figure of speech, of course, g++ does a great job, too) and you use the right parameters.
However, one should keep in mind that such optimization often obfuscates the code. So before you use them switch on all optimizations and use a profiler.
Here are 2 cpp loops and and their conversions to assembly:
1
2
3
4
for(a=0;a<c;a++)
{
           b+=5;
}
(Suppose that b's adress is 100h,a's is 1004h,c's is 1008h)
1
2
3
4
5
6
7
8
9
10
   sub [1004h],1                                //substract one,othervise the loop will begin with 1,not 0
   mov [1004h],eax 
a:                                              //a (sub)function
   add [1004h],"1"                              //add 1 to a,store in accumulator
   mov [1004h],eax                              //a=result
   mov ebx,[1000h]                              //move b to base register
   add ebx,"5"                                  //add 5 to base register content
   mov [1000h],eax                              //b=result
   cmp [1004h],[1008h]                          //compare a with c
   jne a                                        //if not equal,jump to a function 

This takes 80-120 clock cycles to execute(1 cycle=1 sec/cpu speed(hz))
2nd loop:
1
2
3
4
(for a=c;a>0;a--)
{
      b=b+5;
}
Assembly:
1
2
3
4
5
6
7
8
9
10
   add      [1004h],"1"                           //add one,otherwise loop will begin with a-1
   mov      [1004h],eax
a:                                              //a (sub)function
   sub [1004h],"1"                              //substract 1 to a,store in accumulator
   mov [1004h],eax                              //a=result
   mov ebx,[1000h]                              //move b to base register
   add ebx,"5"                                 //add 5 to base register content
   mov [1000h],eax                              //b=result
   cmp [1004h],"0"                             //compare a with 0,this can be faster
   jne a                                        //if not equal,jump to a function 

This can take less,because we compare with a constant
(my assembly code can have mistakes)
Last edited on
Topic archived. No new replies allowed.