As stated in various sources, notably the holy standard, a compiler can prevent, under certain conditions, copying an object created and returned by a function by allocating the needed space in advance in the stack of the caller and passing its address to the callee. This particular feature is named Return Value Optimisation (RVO).
An example might perhaps better clarify this intricacy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
vector<int> my_example(){
vector<int> v;
v.push_back(1);
v.push_back(2);
return v;
}
int main(){
auto v = my_example();
cout << "sizeof v: " << sizeof(v) << endl;
for(int i = 0; i < v.size(); i++){
cout << "[" << i << "] " << v[i] << endl;
}
}
|
Now follow my steps within the debugger:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
(lldb) breakpoint set -b my_example
Breakpoint 1: where = cpp_helloworld`my_example() + 24 at main.cpp:10, address = 0x0000000000000e08
(lldb) r
Process 13612 launched: '/home/god/workspace/eclipse/cpp_helloworld/Debug/cpp_helloworld' (x86_64)
Process 13612 stopped
* thread #1, name = 'cpp_helloworld', stop reason = breakpoint 1.1
frame #0: 0x0000555555554e08 cpp_helloworld`my_example() at main.cpp:10
7 using namespace std;
8
9 vector<int> my_example(){
-> 10 vector<int> v;
11 v.push_back(1);
12 v.push_back(2);
13
(lldb) expr -- &v
(std::vector<int, std::allocator<int> > *) $0 = 0x00007fffffffe4f0
(lldb) bt
* thread #1, name = 'cpp_helloworld', stop reason = breakpoint 1.1
* frame #0: 0x0000555555554e08 cpp_helloworld`my_example() at main.cpp:10
frame #1: 0x0000555555554ed8 cpp_helloworld`main at main.cpp:19
frame #2: 0x00007ffff7157f4a libc.so.6`__libc_start_main + 234
frame #3: 0x0000555555554d0a cpp_helloworld`_start + 42
(lldb) frame select 1
frame #1: 0x0000555555554ed8 cpp_helloworld`main at main.cpp:19
16
17
18 int main(){
-> 19 auto v = my_example();
20 cout << "sizeof v: " << sizeof(v) << endl;
21 for(int i = 0; i < v.size(); i++){
22 cout << "[" << i << "] " << v[i] << endl;
(lldb) expr -- &v
(std::vector<int, std::allocator<int> > *) $1 = 0x00007fffffffe4f0
|
So apparently ::my_example & ::main are sharing the same location for the vector.
Back to the debugger session:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
|
(lldb) frame select 0
(lldb) register read
General Purpose Registers:
rax = 0x00007fffffffe4f0
rbx = 0x0000000000000000
rcx = 0xfca678bc51299800
rdx = 0x00007fffffffe608
rdi = 0x00007fffffffe4f0
rsi = 0x00007fffffffe5f8
rbp = 0x00007fffffffe470
rsp = 0x00007fffffffe430
r8 = 0x00007ffff7dd5fc0 libstdc++.so.6`(anonymous namespace)::num_get_w
r9 = 0x00007ffff7dcada0 libstdc++.so.6`typeinfo for std::locale::facet
r10 = 0x000000000000033f
r11 = 0x00007ffff716e6d0 libc.so.6`__GI___cxa_atexit
r12 = 0x0000555555554ce0 cpp_helloworld`_start
r13 = 0x00007fffffffe5f0
r14 = 0x0000000000000000
r15 = 0x0000000000000000
rip = 0x0000555555554e08 cpp_helloworld`my_example() + 24 at main.cpp:10
rflags = 0x0000000000000206
cs = 0x0000000000000033
fs = 0x0000000000000000
gs = 0x0000000000000000
ss = 0x000000000000002b
ds = 0x0000000000000000
es = 0x0000000000000000
|
Because the frame for ::my_example is in [%rbp, %rsp] = [0x00007fffffffe470, 0x00007fffffffe430] and %rbp < 0x00007fffffffe4f0, the address of the vector must belong to its parent.
Furthermore, the same address is in %rdi, so the caller had, somehow, the knowledge in advance, that it had to pass the address of the vector in %rdi. *
And here is my first question. Because these are two public functions, they can, at least potentially, though not in this example, belong to two different translation units. So how the heck a compiler, when producing the code for ::my_example() can assume that the caller is also playing the same game, i.e. expecting this optimisation of their own?
What if the caller was not providing the vector address in %rdi and expecting a copy of a vector being returned by ::my_example?
The follow up is analogous. In the named RVO case, the callee might not be even able to apply this optimisation at the end. So how a compiler, when translating ::main, can expect, from the caller point of view, that the address provided for the vector will be valid, and indeed, contain the initialised vector?
Regards,
Dean
* I am avoiding to post the assembly for the sake of brevity, the important bit is that %rdi is not copied from something else in the function before this point.