On returning references to local objects and subscripting operator.

Pages: 12
I need a check about returning a reference. In particular, I have two questions.


It's well known that we should never return references to local objects. For instance, the following code

1
2
3
4
int& foo(){
    int a = 2;
    return a;
}


has the problem that once the function has been executed, I can't use a anymore because the int object has a lifetime which is limited to the scope of the function foo()


So far so good.

But if we have our classical Vector class, and consider the usual subscripting operator

1
2
3
4

    T& operator[](const std::size_t i){
        return elem[i];
    }


Here my first question arises: Why is this safe? Is this because elem is allocated on the heap, and hence it lives also after the function call?


- - - - - - - - - - - -
Now to the second question/check. Consider the following Vector class, where I omit for simplicity copy and move semantics.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <vector>
#include <iostream>
#include <memory>
#include <algorithm>
#include <utility>



template <typename T>
class Vector{
private:
    std::size_t _size;
    std::unique_ptr<T[]> elem;
    
public:
    Vector(const std::size_t size): elem{new T[size]}, _size{size}{};
    ~Vector()=default;
    
    Vector(std::initializer_list<T> _list): elem{new T[_list.size()]}{
        _size = _list.size();
        std::copy(_list.begin(),_list.end(),elem.get());
    }
    
    
    T& operator[](const std::size_t i){
        return elem[i];
    }
    
    const T& operator[] (const std::size_t i)const{
        return elem[i];
    }
    
    friend std::ostream& operator<<(std::ostream& os, const Vector& vec){
        for (std::size_t i=0; i< vec._size; ++i) {
            os << vec[i] <<"\n";
        }
        return os;
    }
    
    
    
};


int main(){
    
    Vector<int> v{1,2,3,4};
    std::cout << v << "\n";
    v[2] = 23;
    std::cout << "After [] operator: \n"<<v << "\n";
    
    
    return 0;
}



I'd like to understand and have a check about what's going on under the hood when I call v[2]=23.

First of all, the subscripting corresponds to writing

 
v.operator[](2) = 23


The thing on the l.h.s. of the assignment is precisely a reference (T&) to elem[2], and what I am doing is to set this reference to 23. Hence it's like if I would've written

1
2
int& a = elem[2];
a = 23


because now we have actually changed elem[2].


Is this correct?


Last edited on
Why is this safe? Is this because elem is allocated on the heap, and hence it lives also after the function call?
elem is obviously a member variable. It's generally save to return the reference to a member variable.

In that case you return some data form elem. This is only save as long as elem doesn't change.


because now we have actually changed elem[2].


Is this correct?
Yes.
Thanks for the check(s). So it seems that data member can be returned by reference independently if they're pointer or not, right?

So, something like the following snippet (which, as can be tested, works) is allowed because a is a data member ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>

class Foo{
private:
    int a;
    
public:
    Foo(const int& _a): a{_a}{}
    
    
    int& bar(){
        return a;
    }
    
};


int main(){
    
    Foo test{2};
    std::cout << test.bar() <<"\n";
    return 0;
}
So it seems that data member can be returned by reference independently if they're pointer or not, right?
Not sure what you mean here. It is save to return the pointer as they are member variables. It is however not save to return a reference to the data pointed to.

So, something like the following snippet (which, as can be tested, works) is allowed because a is a data member ?
Yes.
Thanks for the check(s). So it seems that data member can be returned by reference independently if they're pointer or not, right?


There's no magic rule about references to something pointed to by a data member always being valid.

The important thing to understand is the lifespan of the thing you're returning a reference to. From what you've shown us of your code, it looks as though the memory elem points to remains allocated for the entire life of the Vector object. Therefore, as long as the Vector object exists, references to elements of the array that elem points to will be valid.

If your Vector class at any point changes what elem points to, then any references you had to the old array would become useless (and, if you delete the old array, invalid). That is something I would expect to happen in a vector-like class, because one of the important features of such a container is that it manages its own memory, and re-allocates memory when required.


EDIT: In the simpler Foo class that you posted, the reference returned by bar() is valid for as long as the Foo object exists, yes.

I'd also add that there's really no point passing a primitive into the constructor as a const reference; it's no more performant than just passing the primitive by value.
Last edited on
From what you've shown us of your code, it looks as though the memory elem points to remains allocated for the entire life of the Vector object.


Uhm, but since elem is a unique_ptr, then the memory will be automatically released when the Vector goes out of scope. Otherwise I should provide a destructor, but I used a unique_ptr for this precise purpose.


The important thing to understand is the lifespan of the thing you're returning a reference to.


For what concerns my last code snippet, now I am pretty confused: what is the lifespan of the data member a? Is it safe to return a reference to it as I did in my snippet?
Last edited on
@coder777


It is save to return the pointer as they are member variables. It is however not save to return a reference to the data pointed to.


I suppose you're referring to the Vector case. But when I return elem[i] in the subscript operator, am I not returning a reference to the data pointed to?
Last edited on
VoB wrote:
Uhm, but since elem is a unique_ptr, then the memory will be automatically released when the Vector goes out of scope. Otherwise I should provide a destructor, but I used a unique_ptr for this precise purpose.

Yes. What's your point?

For what concerns my last code snippet, now I am pretty confused: what is the lifespan of the data member a? Is it safe to return a reference to it as I did in my snippet?

I answered that as an edit to my previous post, but that's easy to miss:

MikeyBoy wrote:
In the simpler Foo class that you posted, the reference returned by bar() is valid for as long as the Foo object exists, yes.

I'd also add that there's really no point passing a primitive into the constructor as a const reference; it's no more performant than just passing the primitive by value.
Okay thanks, I think I got the point: as long as an object exists, it's okay returning references, otherwise we would end up in a similar situation to the one I wrote in my first message.

There's still something unclear to me, and it's about coder777's last message:

It is save to return the pointer as they are member variables. It is however not save to return a reference to the data pointed to.


I can't understand why is not safe to return a reference to the data pointed to, because it is precisely what I did when I wrote the subscript operator for my custom Vector, where I returned a reference to elem[i].

What am I missing?
Last edited on
I can't speak for coder777, but they may be thinking along the same lines as I was when I said:

MikeyBoy wrote:
If your Vector class at any point changes what elem points to, then any references you had to the old array would become useless (and, if you delete the old array, invalid). That is something I would expect to happen in a vector-like class, because one of the important features of such a container is that it manages its own memory, and re-allocates memory when required.
Thanks for your viewpoint. Actually, that is the classical way to write the subscript operator[], so it should not be a problem I guess.

So you're saying that if I push_back an integer to my vector, i.e. I am changing what elem points to, then any reference to the old array become useless. But I think this is not true because when we implemented our Vector class, I could still access some "old" element before the push_back.

I'm sorry for bothering you, but I can't see what I am missing.
Last edited on
So you're saying that if I push_back an integer to my vector, i.e. I am changing what elem points to, then any reference to the old array become useless.

Yes. References can't be relocated once they're created, so if your calling code is still holding onto a reference that was returned before elem got re-allocated, then it will still be referencing the old array, not the re-allocated one.

But I think this is not true because when we implemented our Vector class, I can access some "old" element before the push_back.

It may be that you didn't delete the old memory before re-allocating the array, so that old array is still sitting there. It's not being used by your Vector object any more, but it's still there. That's why I called it "useless" rather than "invalid".

If you did delete the old memory, then it's good old-fashioned undefined behaviour.

I'm sorry for bothering you, but I can't see what I am missing.

No need to apologize - this can be difficult to get your head around.
Last edited on
if your calling code is still holding onto a reference that was returned before elem got re-allocated, then it will still be referencing the old array, not the re-allocated one.

So, just to make a concrete example of what you said:

1
2
3
std::vector v{1,2,3,4};
int a = v[v.size()-1];
v.push_back(20);


Here a is defined to be 4, and after the push_back it's still 4, because it's still referencing to the old array. I think you're referring precisely to this, right?
As long as you're not changing any data of the pointer there is no problem. The problem arises when you change the data pointed to. As for example you do in your move constructor/assignment. The referenced data is not longer valid after such an event.

I would also suggest to add an assert for nullptr and out of bounds check for i in your operator[]
@coder777

The problem arises when you change the data pointed to. As for example you do in your move constructor/assignment. The referenced data is not longer valid after such an event.


Do you mean something like:

1
2
3
4
std::vector<int> v{1,2,3,4};
int a = v[2];
std::vector<int> w{std::move(v)};
std::cout<< v[2]<<"\n";

,right?

Indeed, after the move cstr, v is no more available and indeed I'll get a segmentation fault.


Last edited on
Here a is defined to be 4, and after the push_back it's still 4, because it's still referencing to the old array. I think you're referring precisely to this, right?

No. It's because a is not a reference. It's a plain int, and changing something in v is not going to magically cause a to change its value.

If a were a reference, defined as:

 
int& a = v[2];

then, yes, assuming your push_back function allocates new memory and doesn't delete the old array. Which would be bad in a different way, because it's a memory leak.

If it does delete the old array, then the reference is no longer valid, and this is undefined behaviour.
Last edited on
Ah yes, now I think I finally see your point. I'll try to figure this out by using the example of before, please let me know if I'm still missing the point:

1
2
3
4
5
6
7
8
int main(){
    std::vector<int> v{1,2,3,4};
    int& a = v[2];
    //v.push_back(5);
    a = 200;
    std::cout << v[2] <<"\n";
    return 0;
}


the output is:

200


so a is really a reference, as expected, because no change has been applied to v.

If I uncomment the push_back line, I obtain:

3


which means that a is useless. It's interesting to me noticing that it's not invalid: possibly because the the std::vector's elem is a smart pointer and in the reserve function it will do something like elem.reset(tmp) which indeed doesn't delete the old array, but just changes the location of the pointer elem and the data he's pointing to.


Hope that everything is fine now :-)
Last edited on
which means that a is useless. It's interesting to me noticing that it's not invalid
I'm going to disagree with MikeyBoy, here. The reference is invalid. Using it in any way after the push_back() leads to undefined behavior, as per the std::vector specification. Since the behavior is undefined, there's little point in trying to reason about the program's behavior after line 5, since all behaviors are permissible.
@helios

thanks for your comment. Could you be more precise about the true reason why that reference is invalid?
If you push_back() while size() == capacity(), all references into the vector are invalidated. At line 4, all you can know about the capacity is that it's >= 4. Since it's possible for the capacity to be equal to the size and you don't check it, that's enough to say that program's behavior is undefined (at least until we start talking about a particular language implementation).
Pages: 12