To speed up a thread, I want to code its loop saving all the milliseconds that's possible.
Let's say I have a small structure, is returning it via pointer faster than returning it normally than returning it by value and copying it to another structure? TMyStruct foo; //global
1 2 3 4
TMyStruct function1()
{
return foo;
}
1 2 3 4
TMyStruct* function2()
{
return &foo;
}
1 2
TMyStruct My = function1();
TMyStruct *pMy = function2();
A decent compiler will use RVO, so return by value would probably end up being faster than returning a pointer. Although the difference isn't likely to be measurable in milliseconds.
If you have to do that a like a bajillion times per second and real time performance is a concern, consider inlining that function (only if it's actually this small though).
So, even when the size is as little as 1 byte, using pointers is faster, and with larger and larger structs, the pass-by-value get's bigger, while the pass-by-address is constant. But I think references are actually near 0... I'll try that out and then edit.
EDIT: Edited as I said I would, and obviosly references are as fast as pointers, but pointer references are slower(as fast as pointer pointers?).
is returning it via pointer faster than returning it normally?
You ask is it faster to RETURN the value via pointer, or value, not is it faste rto COPY it via a returned pointer, or a returned value. But OK, I'll check that out too.
EDIT:
You are probably not interested in how fast it is in debug mode so you should turn on optimizations when doing the benchmark.
I get the following output from above program.
Tiny: 0
Tiny*: 0.16
Small: 0
Small*: 0.26
The compiler is able to optimize away the loops that return the object by value. In a real program you probably do something more useful that is harder to optimize so this benchmark doesn't say much.
In this case optimization is your fiend. The reason is that tiny and bar1 without being modified/used the compiler make it the same and completely resolve the function call
I have to add that if your real code is like what you gave in your first post (by that i mean if the result of the function is always in the same global variable), you simply don't need to return it, you can just read the global variable after the function call
Those zero ms are a nonsense, it's just some optimization that works only if you do nothing with that object I suppose.
I cannot take that test into account then, because I obviously do stuff with that structure in my actual program.
So is passing a pointer faster, right? I'm a bit confused by all these posts. ^ ^
@JLBorges
It will be useful if you add some comprehensive explanation about your post, so I can better understand what you wanted to show us. What I can see there is that the return by value has two more instructions in its assembly counterpart code, so I suppose it's slower, but still you didn't copied the returned value.
I changed my first post to make my question more clear.
Those zero ms are a nonsense, it's just some optimization that works only if you do nothing with that object I suppose.
No! First of all, it's 0 seconds, and second of all, it's RVO, which calls the constructor directly, without copying.
One copy is made when we return (the copy of) a non-temporary object by value. Copy elision (RVO or NRVO in this case) would just eliminate needless multiple copying.
Typical implementation of return by value with: struct A { int i ; /* ... */ }; A aa ;
Our code:
1 2 3 4 5 6 7 8 9 10 11 12
A return_by_value() { return aa ; } // in translation unit one
int foo() // in translation unit two
{
return return_by_value().i ;
}
int bar() // in translation unit two
{
A a = return_by_value() ;
return a.i ;
}
// A return_by_value()
void xxx_return_by_value_yyy( void* raw_memory_for_object )
{
// return aa ;
::new (raw_memory_for_object) A(aa) ; // copy_construct an A into the raw memory
return ;
}
int xxx_foo_yyy() // int foo()
{
// return return_by_value().i ;
char memory[ sizeof(A) ] ; // allocate temporary memory for an object
xxx_return_by_value_yyy(memory) ;
A* pa = reinterpret_cast<A*>(memory) ;
int rv = pa->i ;
pa->A::~A() ; // destroy the anonymous temporary
return rv ;
}
int xxx_bar_yyyy() // int bar()
{
// A a = return_by_value() ;
char memory[ sizeof(A) ] ; // allocate temporary memory for object 'a'
xxx_return_by_value_yyy(memory) ;
A& a = *reinterpret_cast<A*>(memory) ;
// return a.i ;
int rv = a.i ;
a.A::~A() ; // destroy a
return rv ;
}
It is easy to see that this is more expensive than returning a pointer/reference, unless:
a. A has a trivial copy constructor (bit-wise copy)
b. A has a trivial destructor (do nothing)
c. sizeof(A) <= sizeof(pointer to A) (object of type A can be placed in a register)
Now I actually think that my question was stupid. Passing a pointer should obviously be faster.
But all these posts confused me lol.
I don't know what's RVO, NRVO, and I still don't get why those results have zero seconds.
But I'm pretty sure that if you pass a pointer instead of copying an object (instantiating other memory and doing the copy) it's faster. I really doubt it isn't: how can creating a new 10 MB structure be faster/equal than passing a pointer to it lol?
Even if the structure is 2 Bytes, the only possibilities are the ones explained in the post before this one.
If you create something inside a function that you want to return it is often best to return it by value. Returning the object by reference or pointer you will have to find a way to store the object so that it stays valid after the function has ended. If you create it with new, the caller will have to remember to delete it which is easy to forget etc.
RVO or NRVO (not sure about the difference) are optimizations done by most compiler nowadays. It makes so that the object created inside the function is created directly in the place in memory where the returned value will be stored at the call site, removing the need of any copying.