Pointer or entire struct? Speed question

To speed up a thread, I want to code its loop saving all the milliseconds that's possible.
Let's say I have a small structure, is returning it via pointer faster than returning it normally than returning it by value and copying it to another structure?
TMyStruct foo; //global
1
2
3
4
TMyStruct function1()
{
 return foo;
}

1
2
3
4
TMyStruct* function2()
{
return &foo;
}

1
2
TMyStruct My = function1();
TMyStruct *pMy = function2();
Last edited on
A decent compiler will use RVO, so return by value would probably end up being faster than returning a pointer. Although the difference isn't likely to be measurable in milliseconds.
If you have to do that a like a bajillion times per second and real time performance is a concern, consider inlining that function (only if it's actually this small though).
Last edited on
> Let's say I have a small struct ...
> is returning it via pointer faster than returning it normally?

Depends on what you do with the return value.

If the result is used anonymously, or no copy of the object is required at the call site, return by address or reference would be faster.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>

struct A
{
    explicit A( int v = 23 ) : x(v) {}
    A( const A& that ) : x(that.x) { std::cout << "A::copy_constructor\n" ; }
    ~A() { std::cout << "A::destructor\n" ; }
    const int x ;
};

A object ;

A return_by_value() { std::cout << "return_by_value\n" ; return object ; }
const A* return_by_address() { std::cout << "return_by_address\n" ; return &object ; }
const A& return_by_reference() { std::cout << "return_by_reference\n" ; return object ; }

int main() // use anonymously
{
    { std::cout << "---------\n" ; std::cout << return_by_value().x << '\n' ; }
    { std::cout << "---------\n" ; std::cout << return_by_address()->x << '\n' ; }
    { std::cout << "---------\n" ; std::cout << return_by_reference().x << '\n' ; }
    std::cout << "---------\nend of program\n" ;
}

Output:
---------
return_by_value
A::copy_constructor
23
A::destructor
---------
return_by_address
23
---------
return_by_reference
23
---------
end of program
A::destructor



1
2
3
4
5
6
7
int main() // use return value (object, address or reference) 
{
    { std::cout << "---------\n" ; A temp = return_by_value() ; std::cout << temp.x << '\n' ; }
    { std::cout << "---------\n" ; const A* temp = return_by_address() ; std::cout << temp->x << '\n' ; }
    { std::cout << "---------\n" ; const A& temp = return_by_reference() ; std::cout << temp.x << '\n' ; }
    std::cout << "---------\nend of program\n" ;
}

Output:
---------
return_by_value
A::copy_constructor
23
A::destructor
---------
return_by_address
23
---------
return_by_reference
23
---------
end of program
A::destructor




If a copy of the object is required at the call site, there would be no difference; copy elision would eliminate gratuitous copying.
1
2
3
4
5
6
7
int main() // use a copy of the object
{
    { std::cout << "---------\n" ; A temp = return_by_value() ; std::cout << temp.x << '\n' ; }
    { std::cout << "---------\n" ; A temp = *return_by_address() ; std::cout << temp.x << '\n' ; }
    { std::cout << "---------\n" ; A temp = return_by_reference() ; std::cout << temp.x << '\n' ; }
    std::cout << "---------\nend of program\n" ;
}

Output:
---------
return_by_value
A::copy_constructor
23
A::destructor
---------
return_by_address
A::copy_constructor
23
A::destructor
---------
return_by_reference
A::copy_constructor
23
A::destructor
---------
end of program
A::destructor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#include <iostream>
#include <ctime>
 
using namespace std;

struct tinyStruct
{
	bool a;
} tiny;

struct smallStruct
{
	long double a;
} small;

tinyStruct foo11()
{
	return tiny;
}

tinyStruct* foo12()
{
	return &tiny;
}

smallStruct foo21()
{
	return small;
}

smallStruct* foo22()
{
	return &small;
}

const tinyStruct& foo13()
{
	return tiny;
}

tinyStruct* const& foo14()
{
	return &tiny;
}

const smallStruct& foo23()
{
	return small;
}

smallStruct* const& foo24()
{
	return &small;
}

#define time double(clock())/CLOCKS_PER_SEC
#define n 100000000

int main()
{
	long long i=0;
	double b, a=time;
	for (; i<n; ++i) foo11();
	b=time-a;
	cout<<"Tiny: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo13();
	b=time-a;
	cout<<"Const tiny&: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo12();
	b=time-a;
	cout<<"Tiny*: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo14();
	b=time-a;
	cout<<"Tiny* const&: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo21();
	b=time-a;
	cout<<"Small: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo23();
	b=time-a;
	cout<<"Const small&: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo22();
	b=time-a;
	cout<<"Small*: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) foo24();
	b=time-a;
	cout<<"Small* const&: "<<b<<endl;
}
Tiny: 0.563
Const tiny&: 0.55
Tiny*: 0.54
Tiny* const&: 0.58
Small: 1.13
Small&: 0.54
Small*: 0.54
Small* const&: 0.58

So, even when the size is as little as 1 byte, using pointers is faster, and with larger and larger structs, the pass-by-value get's bigger, while the pass-by-address is constant. But I think references are actually near 0... I'll try that out and then edit.
EDIT: Edited as I said I would, and obviosly references are as fast as pointers, but pointer references are slower(as fast as pointer pointers?).
Last edited on
Shouldn't you do:
1
2
3
tinyStruct bar;
[...]
for(;i<n;++i) bar = foo11();

?
Otherwise you don't actually copy the struct into another. Is this unnecessary for the test?

Also, I forgot to say that I'm using C, so I can only use pointers, not references.

OK, this is your question:
is returning it via pointer faster than returning it normally?
You ask is it faster to RETURN the value via pointer, or value, not is it faste rto COPY it via a returned pointer, or a returned value. But OK, I'll check that out too.
EDIT:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <iostream>
#include <ctime>
 
using namespace std;

struct tinyStruct
{
	bool a;
} tiny, bar1;

struct smallStruct
{
	long double a;
} small, bar2;

tinyStruct foo11()
{
	return tiny;
}

tinyStruct* foo12()
{
	return &tiny;
}

smallStruct foo21()
{
	return small;
}

smallStruct* foo22()
{
	return &small;
}

#define time double(clock())/CLOCKS_PER_SEC
#define n 100000000

int main()
{
	long long i=0;
	double b, a=time;
	for (; i<n; ++i) bar1=foo11();
	b=time-a;
	cout<<"Tiny: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) bar1=*foo12();
	b=time-a;
	cout<<"Tiny*: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) bar2=foo21();
	b=time-a;
	cout<<"Small: "<<b<<endl;
	i=0;
	a=time;
	for (; i<n; ++i) bar2=*foo22();
	b=time-a;
	cout<<"Small*: "<<b<<endl;
}
Tiny: 0.614
Tiny*: 0.55
Small: 1.47
Small*: 0.561
Last edited on
You are probably not interested in how fast it is in debug mode so you should turn on optimizations when doing the benchmark.
I get the following output from above program.
Tiny: 0
Tiny*: 0.16
Small: 0
Small*: 0.26

The compiler is able to optimize away the loops that return the object by value. In a real program you probably do something more useful that is harder to optimize so this benchmark doesn't say much.
With best optimizations, for n=10000000000 I got theese results:
Tiny: 0
Tiny*: 9.542
Small: 0
Small*: 28.109
In this case optimization is your fiend. The reason is that tiny and bar1 without being modified/used the compiler make it the same and completely resolve the function call
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
struct A
{
    int x ;
    double y ;
};

A return_by_value() ;
const A* return_by_address() ;
const A& return_by_reference() ;

//with  -O3 -fomit-frame-pointer

int by_value() { return return_by_value().x ; }
/*
__Z8by_valuev:
	subl	$44, %esp
	leal	16(%esp), %eax
	movl	%eax, (%esp)
	call	__Z15return_by_valuev
	movl	16(%esp), %eax
	addl	$44, %esp
	ret
*/


int by_address() { return return_by_address()->x ; }
/*
__Z10by_addressv:
	subl	$12, %esp
	call	__Z17return_by_addressv
	movl	(%eax), %eax
	addl	$12, %esp
	ret
*/


int by_reference() { return return_by_reference().x  ; }
/*
__Z12by_referencev:
	subl	$12, %esp
	call	__Z19return_by_referencev
	movl	(%eax), %eax
	addl	$12, %esp
	ret
*/

I have to add that if your real code is like what you gave in your first post (by that i mean if the result of the function is always in the same global variable), you simply don't need to return it, you can just read the global variable after the function call
Those zero ms are a nonsense, it's just some optimization that works only if you do nothing with that object I suppose.
I cannot take that test into account then, because I obviously do stuff with that structure in my actual program.
So is passing a pointer faster, right? I'm a bit confused by all these posts. ^ ^


@JLBorges
It will be useful if you add some comprehensive explanation about your post, so I can better understand what you wanted to show us. What I can see there is that the return by value has two more instructions in its assembly counterpart code, so I suppose it's slower, but still you didn't copied the returned value.

I changed my first post to make my question more clear.
Last edited on
Those zero ms are a nonsense, it's just some optimization that works only if you do nothing with that object I suppose.
No! First of all, it's 0 seconds, and second of all, it's RVO, which calls the constructor directly, without copying.
> .. so I suppose it's slower,

Yes


> but still you didn't copied the returned value.

One copy is made when we return (the copy of) a non-temporary object by value. Copy elision (RVO or NRVO in this case) would just eliminate needless multiple copying.

Typical implementation of return by value with:
struct A { int i ; /* ... */ }; A aa ;

Our code:
1
2
3
4
5
6
7
8
9
10
11
12
A return_by_value() { return aa ; } // in translation unit one

int foo() // in translation unit two
{
    return return_by_value().i ;          
}

int bar() // in translation unit two
{
    A a = return_by_value() ;
    return a.i ;          
}


Augmented code generated (typical, expositional):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// A return_by_value()
void xxx_return_by_value_yyy( void* raw_memory_for_object ) 
{ 
    // return aa ;
    ::new (raw_memory_for_object) A(aa) ; // copy_construct an A into the raw memory
    return ;  
}

int xxx_foo_yyy() // int foo()
{
    // return return_by_value().i ;

    char memory[ sizeof(A) ] ; // allocate temporary memory for an object
    xxx_return_by_value_yyy(memory) ;
    A* pa = reinterpret_cast<A*>(memory) ;

    int rv = pa->i ;
    pa->A::~A() ; // destroy the anonymous temporary
    return rv ;              
}


int xxx_bar_yyyy() // int bar()
{
    // A a = return_by_value() ;

    char memory[ sizeof(A) ] ; // allocate temporary memory for object 'a'
    xxx_return_by_value_yyy(memory) ;
    A& a = *reinterpret_cast<A*>(memory) ;
    
    // return a.i ;
   
    int rv = a.i ;
    a.A::~A() ; // destroy a
    return rv ;              
}


It is easy to see that this is more expensive than returning a pointer/reference, unless:
a. A has a trivial copy constructor (bit-wise copy)
b. A has a trivial destructor (do nothing)
c. sizeof(A) <= sizeof(pointer to A) (object of type A can be placed in a register)

Now I actually think that my question was stupid. Passing a pointer should obviously be faster.
But all these posts confused me lol.
I don't know what's RVO, NRVO, and I still don't get why those results have zero seconds.
But I'm pretty sure that if you pass a pointer instead of copying an object (instantiating other memory and doing the copy) it's faster. I really doubt it isn't: how can creating a new 10 MB structure be faster/equal than passing a pointer to it lol?
Even if the structure is 2 Bytes, the only possibilities are the ones explained in the post before this one.
Last edited on
bump
If you create something inside a function that you want to return it is often best to return it by value. Returning the object by reference or pointer you will have to find a way to store the object so that it stays valid after the function has ended. If you create it with new, the caller will have to remember to delete it which is easy to forget etc.

RVO or NRVO (not sure about the difference) are optimizations done by most compiler nowadays. It makes so that the object created inside the function is created directly in the place in memory where the returned value will be stored at the call site, removing the need of any copying.
1
2
3
4
5
6
7
8
Foo bar()
{
	Foo foo(10);
	....
	return foo;
}
...
Foo foo = bar(); // no copying needed 


When passing function arguments that are costly to copy you should prefer passing them by reference.
OK, got it.
Topic archived. No new replies allowed.