Here is an interesting observation I have made. I am optimizing some code at the moment, and even 10 microseconds is important. The application will be run 1mil-10mil times so every ms saved per run adds up very quickly.
I found an article about optimizing C that said computers are better at counting down towards 0, then up towards an arbitrary number.
So I put this to the test. I have a double-loop, and inside that are 2 more double loops. Giving me on the order of 50mil-100mil iterations per run. I changed the loops from incrementing to decrementing. It saved up-to 100milliseconds per run.
Interesting eh :)
(100ms x 1mil Iterations) / 1000 = 100,000 Seconds Saved possibly
E.g. 27 hours.
It is because CPUs implement the instructions jge, ... for "Jump if Greater or Equal *to zero*", so in a loop, each cycle has to substract the counter from the "target value" if you don't test for zero. However, this should be optimized away if your compiler is worth it's money (...as a figure of speech, of course, g++ does a great job, too) and you use the right parameters.
However, one should keep in mind that such optimization often obfuscates the code. So before you use them switch on all optimizations and use a profiler.
Here are 2 cpp loops and and their conversions to assembly:
1 2 3 4
for(a=0;a<c;a++)
{
b+=5;
}
(Suppose that b's adress is 100h,a's is 1004h,c's is 1008h)
1 2 3 4 5 6 7 8 9 10
sub [1004h],1 //substract one,othervise the loop will begin with 1,not 0
mov [1004h],eax
a: //a (sub)function
add [1004h],"1" //add 1 to a,store in accumulator
mov [1004h],eax //a=result
mov ebx,[1000h] //move b to base register
add ebx,"5" //add 5 to base register content
mov [1000h],eax //b=result
cmp [1004h],[1008h] //compare a with c
jne a //if not equal,jump to a function
This takes 80-120 clock cycles to execute(1 cycle=1 sec/cpu speed(hz))
2nd loop:
1 2 3 4
(for a=c;a>0;a--)
{
b=b+5;
}
Assembly:
1 2 3 4 5 6 7 8 9 10
add [1004h],"1" //add one,otherwise loop will begin with a-1
mov [1004h],eax
a: //a (sub)function
sub [1004h],"1" //substract 1 to a,store in accumulator
mov [1004h],eax //a=result
mov ebx,[1000h] //move b to base register
add ebx,"5" //add 5 to base register content
mov [1000h],eax //b=result
cmp [1004h],"0" //compare a with 0,this can be faster
jne a //if not equal,jump to a function
This can take less,because we compare with a constant
(my assembly code can have mistakes)