Operator speeds

I don't care that much about the speed of various operators, but I'm making a game engine, so speed could quickly become an issue. My first question of speed: which is faster, addition or subtraction (particularly with the ++ and -- operators)?
Neither.
Ok, another question.

Is there a faster way to draw one surface to another than nested for loops?

Also, is it faster to use a local variable, or members of a class from a pointer? Or are they the same speed..?
PiMaster wrote:
Is there a faster way to draw one surface to another than nested for loops?

Doing it in one loop perhaps.

PiMaster wrote:
Also, is it faster to use a local variable, or members of a class from a pointer? Or are they the same speed..?

I'd say the first. The second is slower because you need to dereference the pointer.
which is faster, addition or subtraction (particularly with the ++ and -- operators)?


Use pre-increment or pre-decrement ( --i , ++j, etc) espesially in loops, as the opposite (i++, j++) can involve the creation of temporary data and a copy.

Even if your compiler optimizes this away it's still a good habit to be in.
Use pre-increment or pre-decrement ( --i , ++j, etc) espesially in loops, as the opposite (i++, j++) can involve the creation of temporary data and a copy.

Only for more complex data types, in all other cases these are mere formalities that result in no additional code.

I'd say the first. The second is slower because you need to dereference the pointer.

Accessing local variables is the same as accessing a class member (dereferencing stack pointer+offset). The member pointer needs to be stored in a register, but often just once (i.e. before a loop, but not necessarily again inside).

There's just one reliable way to tell which of several different methods is faster:
create a suitable testcase and measure the time.
@Athar:

Are you sure?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void f(MyClass * p)
{
    int x;
    int & y=p->member;

    //do these two have
    //the same access time?
    x=5;
    p->member=10;

    //I'm not talking about
    //this. This is different
    x=5;
    y=10;
}

EDIT: Or did the OP mean something like this?

1
2
3
4
5
6
7
8
9
10
11
void f()
{
    int x;
    MyClass c;
    
    int * p1=&x;
    int * p2=&c.member;
    
    *p1=5;
    *p2=10;    
}
Last edited on
I think he meant the first one.
x=5 is just one instruction (probably mov [ebp-4],5), while p->member=10 takes two (e.g. mov eax,[ebp+something], mov [eax+something],10). However what I meant is that the first instruction has to be executed just once. All following member accesses for p just take one instruction for the time the compiler decides to keep the address in eax.
Ah, I see! Though, I can imagine cases where both instructions have to be executed because eax has to change between the class-members access operations...

Mmmm... Since you are familiar with assembly, would you also happen to know how references are implemented? Are they implemented as pointers or does the compiler hardcode the address of the referenced object every time he encounters a reference? (I think the latter would explain why they are so much more limited than pointers) Because, now that I look at it again, maybe y=10; is not different from p->member=10; above, hahaha :D
Last edited on
Ah, I see! Though, I can imagine cases where both instructions have to be executed because eax has to change between the class-members access operations...

Yeah, there's that. But if the functions that are called are short and can be inlined, the compiler can still take special care that eax (or another register) is not used for other operations if it considers that to be profitable.

References are implemented as pointers, so it's basically just the syntax that is different. In fact, if you change your example so that p is passed by reference, the compiler would still produce exactly the same code.
Is there a faster way to draw one surface to another than nested for loops?
If you can make such assumptions as "both surfaces have the same pitch", "both surfaces have their channels in the same order", or "I don't need alpha blending", you can usually optimize pixel copy operations by calling memcpy(), or by copying whole pixels into 32-bit integers.

But even not using any of those assumptions can be fast enough. By combining very careful coding with thread pools, I wrote an alpha blending routine once that had a theoretical top speed on my machine (Core 2 Duo 1.86 GHz) of ~500 640x480x32 blits per second, or 6.5 nanoseconds per pixel. Four times as fast as my original version. It's all just a matter of being careful with what you write and doing a lot of testing to see what performs better.
Topic archived. No new replies allowed.