Initializing a vector of vectors

Forum

Forum
Beginners
Initializing a vector of vectors

Initializing a vector of vectors

Jul 24, 2018 at 10:57pm

JeffR1992 (5)

I came across the following piece of code, and I'm struggling to figure out exactly what's going on:

1
2

int n = 10;
std::vector<std::vector<int> > adjacencyMatrix(n, std::vector<int>(0))

I see that it is a vector of vectors, so it's essentially a matrix. However, the initialization of adjacencyMatrix is quite confusing. What is going on when we say adjacencyMatrix(n, std::vector<int>(0))? I'm assuming it's making n=10 rows of vectors, but I'm not too sure what std::vector<int>(0) does. Any help would be much appreciated.

Last edited on Jul 24, 2018 at 10:57pm

Jul 25, 2018 at 1:28am

helios (17607)

I'm not too sure what std::vector<int>(0) does

It creates a vector of length 0.

It could have also been written like this, since the default std::vector<T> constructor constructs a vector of length 0:

std::vector<std::vector<int>> adjacencyMatrix(n);

Or, to make it a square matrix:

std::vector<std::vector<int>> adjacencyMatrix(n, std::vector<int>(n));

Be careful when making matrices like this, because there's no way to ensure that the matrix doesn't end up jagged. That is, one row with one number of columns, and another row with a different number of columns.

https://en.wikipedia.org/wiki/Jagged_array

Last edited on Jul 25, 2018 at 1:28am

Jul 25, 2018 at 7:32am

Thomas1965 (4571)

I wonder why so many use nested vectors instead of using a simple vector wrapped in a class.
There must be a performance problem if you have a matrix 1000x1000

#include <iostream>
#include <vector>
#include <cassert>

using namespace std;

// Extremly simple Matrix class
// Just a demo
template<class T>
class Matrix
{
public:
  Matrix(size_t rows, size_t cols)
  {
    assert(rows > 0);
    assert(cols > 0);
    mRows = rows;
    mCols = cols;
    mData.resize(mRows * mCols);
  }
  T& operator()(size_t row, size_t col)
  {
    assert(row < mRows);
    assert(col < mCols);
    return mData[row * mRows + col];
  }
  const T& operator()(size_t row, size_t col) const
  {
    assert(row > 0 && row < mRows);
    assert(col > 0 && row < mCols);
    return mData[row * mRows + col];
  }
  size_t rows() const noexcept
  {
    return mRows;
  }
  size_t cols() const noexcept
  {
    return mCols;
  }
protected:
  size_t mRows, mCols;
  std::vector<T> mData;
};


int main()
{
  Matrix<int> a(2, 2);
  a(0, 0) = 2;
  a(0, 1) = 1;
  a(1, 0) = 3;
  a(1, 1) = 5;

  for (size_t row = 0; row < a.rows(); ++row)
  {
    for (size_t col = 0; col < a.cols(); ++col)
    {
      cout << a(row, col) << ' ';
    }
    cout << '\n';
  }
}

Edit & run on cpp.sh

Jul 25, 2018 at 3:25pm

helios (17607)

I wonder why so many use nested vectors instead of using a simple vector wrapped in a class.

Well, a nested vector is the most straightforward analog of a matrix, conceptually speaking. A lot of people don't realize that you can map a matrix onto a sequential structure.

There must be a performance problem if you have a matrix 1000x1000

Nah, I doubt it. Not anymore, now that we have move semantics. A nested vector is less space-efficient, though, because it needs to store more pointers. Also there a slight penalty when iterating all cells, since not all cells are contiguous in memory. The cache is less effective like that.

Jul 25, 2018 at 3:55pm

Thomas1965 (4571)

I was taught the memory allocations and deallocations are expensive. Maybe that's wrong ?

Jul 25, 2018 at 4:43pm

helios (17607)

Let's assume that a vector<vector<int>> is used to represent an n-by-m matrix (n being the size of the outer vector and m being the sizes of the inner vectors. n > 0 && m > 0), and that there's an equivalent Matrix<int> that internally uses a single vector<int>.

Creation:
vector<vector<int>>:
Allocations: n + 1 (one for the outer and one for each inner)
Assignments: n * m

Matrix<int>:
Allocations: 1
Assignments: n * m

Notes: Creation happens only once, and the cost of initializing the cells to zero asymptotically overpowers the cost of allocation anyway.

Adding a row (n):
vector<vector<int>>:
Allocations: 1 or 2. The outer vector can be grown in O(1) if it has unused capacity. The new vector always has to be allocated.
Assignments: m

Matrix<int>:
Allocations: 0 or 1. The vector may have unused capacity.
Assignments: either m or n * m. If the vector doesn't have unused capacity, all cells have to be copied.

Adding a column (m):
vector<vector<int>>:
Allocations: between 0 and n. Each inner vector may have unused capacity.
Assignments: between n and n*m. At least, the new column has to be initialized. Every inner vector that doesn't have unused capacity has to be copied.

Matrix<int>:
Allocations: 0 or 1. The vector may have unused capacity.
Assignments: n * m. Adding a column always involves shifting the cells around.

Conclusion: Asymptotically, a nested vector is always at least as fast as a Matrix<int>. During creation a nested vector may be marginally slower. A Matrix<int> is always more costly to grow than a nested vector.
I didn't compare access times, since that doesn't involve allocation. I believe Matrix<int> will be faster due to cache effects. Even without a cache, accessing a Matrix<int> involves fewer and cheaper operations than accessing a nested vector.

Jul 26, 2018 at 7:18am

Thomas1965 (4571)

Thanks Helios,

seems a single vector makes only sense when the number of rows and columns is fixed.

Jul 26, 2018 at 3:35pm

icy1 (596)

is matrix[row * mRows + col] really cheaper than a 2D vector's [row][col] access? Tell me more about these "cache effects" ;D
There might be an invisible maintenance cost (math) for translating some matrix operations to 1D, but once these are ironed out, I suppose it could work.

Jul 26, 2018 at 4:09pm

helios (17607)

The cache for the most part takes advantage of locality of reference, which you lose if you need to jump around in memory to look for the next cell. Obviously a nested vector is not as bad as a linked list, but a single vector will still be faster.
But like I said, even if your system didn't have a cache, vector[row * mRows + col] involves fewer operations than vector[row][col].

vector[row * mRows + col]:

//(Pseudo-Assembly)
byte *pointer = vector.array;
int element = row * mRows + col;
int offset = element * sizeof(element_type);
load(pointer + offset);

vector[row][col]:

//(Pseudo-Assembly)
byte *pointer = vector.array;
int element = row;
int offset = element * sizeof(outer_element_type);
pointer = load(pointer + offset);
element = col;
offset = element * sizeof(inner_element_type);
load(pointer + offset);

The loads will be the most expensive operations.

Besides just performance, a single vector has other advantages. It's more space-efficient, it's easier to copy (often a single memcpy() is enough), it's easier to store in files (if you don't care about portability, a single std::ostream::write() will save it).

There might be an invisible maintenance cost (math) for translating some matrix operations to 1D

Nah, you just write your access(x, y) function and then implement your math using that. Otherwise you'll pull your hair out.

Last edited on Jul 26, 2018 at 4:11pm

Topic archived. No new replies allowed.

C++

Forum

Initializing a vector of vectors