I came across the following piece of code, and I'm struggling to figure out exactly what's going on:
1 2
int n = 10;
std::vector<std::vector<int> > adjacencyMatrix(n, std::vector<int>(0))
I see that it is a vector of vectors, so it's essentially a matrix. However, the initialization of adjacencyMatrix is quite confusing. What is going on when we say adjacencyMatrix(n, std::vector<int>(0))? I'm assuming it's making n=10 rows of vectors, but I'm not too sure what std::vector<int>(0) does. Any help would be much appreciated.
Be careful when making matrices like this, because there's no way to ensure that the matrix doesn't end up jagged. That is, one row with one number of columns, and another row with a different number of columns.
I wonder why so many use nested vectors instead of using a simple vector wrapped in a class.
There must be a performance problem if you have a matrix 1000x1000
I wonder why so many use nested vectors instead of using a simple vector wrapped in a class.
Well, a nested vector is the most straightforward analog of a matrix, conceptually speaking. A lot of people don't realize that you can map a matrix onto a sequential structure.
There must be a performance problem if you have a matrix 1000x1000
Nah, I doubt it. Not anymore, now that we have move semantics. A nested vector is less space-efficient, though, because it needs to store more pointers. Also there a slight penalty when iterating all cells, since not all cells are contiguous in memory. The cache is less effective like that.
Let's assume that a vector<vector<int>> is used to represent an n-by-m matrix (n being the size of the outer vector and m being the sizes of the inner vectors. n > 0 && m > 0), and that there's an equivalent Matrix<int> that internally uses a single vector<int>.
Creation:
vector<vector<int>>:
Allocations: n + 1 (one for the outer and one for each inner)
Assignments: n * m
Matrix<int>:
Allocations: 1
Assignments: n * m
Notes: Creation happens only once, and the cost of initializing the cells to zero asymptotically overpowers the cost of allocation anyway.
Adding a row (n):
vector<vector<int>>:
Allocations: 1 or 2. The outer vector can be grown in O(1) if it has unused capacity. The new vector always has to be allocated.
Assignments: m
Matrix<int>:
Allocations: 0 or 1. The vector may have unused capacity.
Assignments: either m or n * m. If the vector doesn't have unused capacity, all cells have to be copied.
Adding a column (m):
vector<vector<int>>:
Allocations: between 0 and n. Each inner vector may have unused capacity.
Assignments: between n and n*m. At least, the new column has to be initialized. Every inner vector that doesn't have unused capacity has to be copied.
Matrix<int>:
Allocations: 0 or 1. The vector may have unused capacity.
Assignments: n * m. Adding a column always involves shifting the cells around.
Conclusion: Asymptotically, a nested vector is always at least as fast as a Matrix<int>. During creation a nested vector may be marginally slower. A Matrix<int> is always more costly to grow than a nested vector.
I didn't compare access times, since that doesn't involve allocation. I believe Matrix<int> will be faster due to cache effects. Even without a cache, accessing a Matrix<int> involves fewer and cheaper operations than accessing a nested vector.
is matrix[row * mRows + col] really cheaper than a 2D vector's [row][col] access? Tell me more about these "cache effects" ;D
There might be an invisible maintenance cost (math) for translating some matrix operations to 1D, but once these are ironed out, I suppose it could work.
The cache for the most part takes advantage of locality of reference, which you lose if you need to jump around in memory to look for the next cell. Obviously a nested vector is not as bad as a linked list, but a single vector will still be faster.
But like I said, even if your system didn't have a cache, vector[row * mRows + col] involves fewer operations than vector[row][col].
vector[row * mRows + col]:
1 2 3 4 5
//(Pseudo-Assembly)
byte *pointer = vector.array;
int element = row * mRows + col;
int offset = element * sizeof(element_type);
load(pointer + offset);
vector[row][col]:
1 2 3 4 5 6 7 8
//(Pseudo-Assembly)
byte *pointer = vector.array;
int element = row;
int offset = element * sizeof(outer_element_type);
pointer = load(pointer + offset);
element = col;
offset = element * sizeof(inner_element_type);
load(pointer + offset);
The loads will be the most expensive operations.
Besides just performance, a single vector has other advantages. It's more space-efficient, it's easier to copy (often a single memcpy() is enough), it's easier to store in files (if you don't care about portability, a single std::ostream::write() will save it).
There might be an invisible maintenance cost (math) for translating some matrix operations to 1D
Nah, you just write your access(x, y) function and then implement your math using that. Otherwise you'll pull your hair out.