Question about using references

I am wondering if someone can tell me if I am using references properly to pass around a large data structure to different functions within my program. Please keep in mind my main priority is speed.

I have a very large data base which is read into a vector data structure. My program first loads the data base into a data structure defined in the 'DataBase' class. This data is then passes to several different functions and the data is processed. Here is the first part of my main code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class DataBase
{
}

class DataStructure1
{
}

class DataStructure2
{
}

void main
{
//load data base
DataBase &ref_data_base = load_data_base();

DataStructure1 &ref_structure1 = build_structure1(ref_data_base);

DataStructure2 &ref_structure2 = build_structure2(ref_structure1);

}


My functions look something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
DataStructure1 build_structure1(&ref_data_base)
{
DataStructure1 data_structure1;
.
.
.
return data_structure1;
}


DataStructure2 build_structure2(&ref_structure1)
{
DataStructure2 data_structure2;
.
.
.
return data_structure2;
}


I have not used references before so I am not very sure if I am doing this correctly. Also, if my main goal is performance, should I consider pointers?


Passing by reference passes a pointer to the object rather than the value of the object, so you are using pointers. It can sometimes be neater in larger projects using parallelization to initially declare an object as a pointer (e.g int* a = new int(5)), but this doesn't seem to be relevant to you.

In the case of the code above, you are not correctly understanding the way memory is handled in C++ and you have a few syntax errors. First off, you have declared pointers to an object that does not exist in the relevant namespace on lines 16, 18, and 20. You still must declare a variable normally (unless you declare a variable as above) before defining it. Second, you do not name a type for the arguments to build_structure1 and build_structure2. Lastly, class declarations must be followed by semicolons (only function definitions are not).

The following is correct:
1
2
3
4
5
6
7
8
9
10
11
12
class DataBase { ... };

class DataStructure1 { ... };

class DataStructure2 { ... };

void main() {
  //load data base
  DataBase ref_data_base = load_data_base();  //<<-- Note the lack of '&'s here.
  DataStructure1 ref_structure1 = build_structure1(ref_data_base);
  DataStructure2 ref_structure2 = build_structure2(ref_structure1);
}


and

1
2
3
4
5
6
7
8
9
10
11
12
DataStructure1 build_structure1(DataBase &ref_data_base) { //<<-- Note the type name 'DataBase'
  DataStructure1 data_structure1;
  ...
  return data_structure1;
}


DataStructure2 build_structure2(DataBase &ref_structure1) {
  DataStructure2 data_structure2;
  ...
  return data_structure2;
}
Last edited on
If your priority is speed, you should be taking arguments by reference to const, returning new objects by value, and modifying existing objects by reference:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DataStructure1 build_structure1(const DataBase& db)
{
    DataStructure1 ds;
    ...
    return ds;
}

void modify_stucture1(DataStructure1& ds)
{
   ...
}
...
DataStructure1 structure1 = build_structure1(ref_data_base);
modify_structure1(structure1);


Pointers are needed if you're dealing with objects that outlive their scopes, or have some other nontrivial lifetime management logic. which may or may not be relevant.
I would amend Cubbi's comments to say that if speed is a concern, new objects should be returned as a reference argument, too, to eliminate a call to the assignment operator, especially if the new object is large.

1
2
3
4
5
6
7
8
9
10
11
12
void build_structure1(const DataBase& db, DataStructure& newObject)
{
    <initialize newObject>
}

int main()
{
   DataBase db = ...;
   DataStructure ds;      // assuming DataStructure has a default constructor

   build_Structure(db, ds);
}


Edit: fixed typos
Last edited on
Creating an object and passing it by reference to avoid returning one by value is premature pessimization. In many cases the compiler can elide the temporary altogether via RVO. And the scenario is even brighter with move semantics.

Presumably that wouldn't be a call to the assignment operator, but a call to the copy constructor (which wouldn't be made at all with RVO.)
Passing by reference passes a pointer to the object rather than the value of the object, so you are using pointers.
No, passing by reference is just using another name for an object that already exists in memory, it is not the same thing as passing a pointer to some location in memory. References cannot be NULL. disclaimer: references should never be NULL but this is another topic completely.

First off, you have declared pointers to an object that does not exist in the relevant namespace on lines 16, 18, and 20.
This is incorrect, the OP declares and initializes references, not pointers. The error in his ways is returning a copy of a temporary object that will cease to exist after the function is returned (if he chose to return a reference).
Topic archived. No new replies allowed.