why do push_back() and direct access work fine in these codes?

Sep 30, 2008 at 3:58am
Hello, could someone explain me why this two codes work? I got surprised when I found out that I can use either push_back() method or direct access mtr[i][j] when mtr was declared as: vector< vector<double> > mtr;

which of the codes would be the most efficient?

code 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <vector>

using namespace std;

int main()
{
  int length = 5;
  int num = 10;
  
  vector< vector<int> > mtr;
  
  mtr.resize(length);
  for(int i = 0; i < length; i++)
  {
    mtr[i].reserve(length);

    for (int j = 0; j < length; j++)
    {
      mtr[i][j] = num;
      cout<<"mtr["<<i<< "]["<< j<< "]= "<< mtr[i][j] <<endl;
    }
  }  
    
  return 0;
}


code 2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <vector>

using namespace std;

int main()
{
  int length = 5;
  int num = 10;
  
  vector< vector<int> > mtr;
  
  mtr.resize(length);
  for(int i = 0; i < length; i++)
  {
    mtr[i].reserve(length);

    for (int j = 0; j < length; j++)
    {
      mtr[i].push_back(num);
      cout<<"mtr["<<i<< "]["<< j<< "]= "<< mtr[i][j] <<endl;
    }
  }  
    
  return 0;
}
Last edited on Sep 30, 2008 at 4:04am
Sep 30, 2008 at 1:13pm
They are not the same thing.

push_back() adds a new element to the end of the vector.
operator[] accesses an already existing element of the vector.

Also, be careful with your use of reserve() vs. resize() because they
too are not the same thing.

resize(N) causes the vector to contain exactly N elements.
reserve(N) causes the vector allocate enough backing store for N elements but does not actually cause the vector to contain N elements.

Sep 30, 2008 at 4:43pm
Yes, I know and that is why I am asking. Why does then mtr[i][j] work? For the reason you say above it should only work push_back() method.
Sep 30, 2008 at 7:02pm
It works because you are accessing memory that has been allocated by the vector.

So vector's guarantee to the user is that the memory allocated is one contiguous block.
vector has 3 data members: a pointer to the start of the memory block, a pointer to the
end of the memory block, and a pointer one past the last used element.

So in example 1, mtr[i] looks like this for example:
start = 0x1000
end = 0x1014
last = 0x1000

When you access mtr[i][j] with j = 1, for example, this writes a value to
memory location (start + sizeof(int)) or 0x1004. That memory is allocated
obviously. But the vector doesn't realize that index 1 is now used so
if you were to call size() at this time, it would still return 0.

Sep 30, 2008 at 10:32pm
Tanks. Yes, I read something similar in a book. Also, you will have problems using at(). So, the use of [] seems to be inconsistent here. For me that should be a warning saying: *Don't use it*. I particularly like push_back() method becuase is cheap. In a 1D vector there is no problem, but in 2D or higher order vectors inconsistencies like this arise. A partial solution I found is use resize before the loop and after the loop:
1
2
3
4
5
6
7
8
9
10
11
  mtr.resize(length);
  for(int i = 0; i < length; i++)
  {
    mtr[i].resize(length);

    for (int j = 0; j < length; j++)
    {
      mtr[i][j] = num;
      cout<<"mtr["<<i<< "]["<< j<< "]= "<< mtr[i][j] <<endl;
    }
  }  

But this certainly will initiate the rows with default number of the types, in that case, integers. That implies an extra work than reserving room for a certain qty. of integers becuase the function will set with 0 all the elements. This entiles an access to the vector. On the contrary, if I use reserve() method that means an empty vector that I can fill after using push_back() method.
Oct 1, 2008 at 12:21pm
I think it is a matter of personal preference. vectors were supposed to replace arrays and therefore have access similar to arrays, hence operator[]. Since no bounds checking is done when you access an array via [], no bounds checking is done on vector either. That is why at() was created -- it is the same thing as operator[], except it performs the bounds checking before it allows the user to access unallocated memory.

My preference is to use reserve() and push_back() instead of resize() and operator[] to fill out an array.

My preference also is to use iterators to walk the vector rather than indices, which means I use neither operator[] nor at() unless I'm being lazy.
Oct 1, 2008 at 5:10pm
I prefer the "+=" syntax with a std::back_inserter.

Since no bounds checking is done when you access an array via [], no bounds checking is done on vector either.

Not quite 100% true. The behavior after dereferencing an invalid pointer (or the past-the-end pointer, which is technically not 'invalid', but may not be dereferenced) is undefined, in C as well as in C++. Thus, operator[] might throw an exception if the operand is out of range (though it is not done, usually). If the operand is not out-of-range, operator[] just makes the guarantee to be done in O(1), so bound checking is possible. at() on the other hand is guaranteed to be range-checked and throw std::out_of_range if the index is invalid. But operator[] may be implemented in terms of at().
Oct 1, 2008 at 7:58pm
Thanks for the useful insights. I also prefer using reserve() and push_back() but, how can it be done in a 2d vector? I guess it is not possible without having the problem of size() and at() and possibly iterators.

Any suggestion?
Oct 2, 2008 at 12:23pm
Generally I'd try to do something like this:

1
2
3
4
5
6
7
8
vector< vector< int > > twodvec;

for( size_t i = 0; i < NUM_VECS; ++i ) {
    vector< int > v;
    for( size_t j = 0; j < NUM_NUMS; ++j )
         v.push_back( rand() );
    twodvec.push_back( v );
}


If you want to create a 2D vector where all the values are the same, as in your code above, there is a simpler way:

 
vector< vector< int > > vec( NUM_VECS, vector<int>( NUM_NUMS, VALUE ) );


would create a NUM_VECS x NUM_NUMS matrix where each element has the
value VALUE.
Oct 2, 2008 at 5:17pm
Thank you. I have also tried the your two above suggestion. But I am looking for something which is more efficient than that. For a large 2d vector the above code will generate several reallocations which is expensive. That is why I would like to use reserve() method, but it seems that it is not possible to do it. So, I will have to survive with what is available.
Oct 2, 2008 at 5:25pm
push_back() runs in amortized constant time due to exponential growth, no reserve() or other hacks are required. So what is the big deal, anyways? Have you used a profiler and it shows that the memory allocation is a problem? I doubt it, and even if so, it is most certainly not due to push_back()s. Efficiency is achieved by good algorithms, not by implementation details.
Oct 2, 2008 at 6:29pm
If you are concerned about performance, then in my 1st example above you just need
to add 2 calls to reserve: the first one at line 2, to reserve NUM_VECS elements in twodvec, and the second one between lines 4 and 5 to reserve NUM_NUMS elements in v.

My 2nd example above already is as efficient as it's going to get, though, since the vectors will construct themselves with only a single allocation each.

Oct 2, 2008 at 6:42pm
It seems reasonable. The only doubt I have then is how reserve() will know in line 2 the size that "v" will have between lines 4 and 5. I would think that reserve() will guess a number, say 10, but after when reserve() is called to allocate "v" the length of it might be a larger number, say 20, so reserve() has to allocate again. So, if this is encountered several times during the loop it might probably not be efficient. Maybe I am wrong?.
Oct 2, 2008 at 8:05pm
It doesn't need to know. A vector has three data members: start, end, and last, each of which are pointers. If sizeof(T*)==4 then twodvec has to allocate enough space to hold (3*4)*NUM_VECS bytes, which is what it will do.



Topic archived. No new replies allowed.