accessing a one-dimensional vector in 2-dimensional manner

Is there a way, overloading the operator[] of a std::vector<int> my_v(row*col) in such manner that I could it access by my_v[row][col]?
How about:
1
2
3
4
5
6
7
8
9
10
struct Row {
  // ..
  int& operator[] (size_t);
};

class Matrix {
  // ...
  Row& operator[] (size_t);
  // ..
};
Thanks for your response. I tried to implement your idea, but I failed in getting access from both structs to the same vector. I would be pleased if someone would show me a possible implementation.
Last edited on
Can you show your attempt?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <vector>
#include <iostream>

template <typename T>
struct Row
{
     // here I don't know how to implement the stuff
     T & operator[]( int col ) { /* ??? */ }
};


template <typename T>
class Matrix
{
    std::vector<T> m_v;
    int m_row_size, m_col_size;

public:
    Matrix( int rows, int cols)
    : m_row_size{rows}, m_col_size{cols}, m_v { rows * cols }
    {}

    Row & operator[]( int row) { return /* ??? */ ; }
};

int main()
{
    Matrix<int> matrix( 10, 10 );

    for( int row = 0; row < 10; ++ row) {
        for( int col = 0; col < 10; ++ col ) {
            matrix[ row ][ col ] = row+1 * col+1;
            std::cout << matrix[ row ][col ] << ' ';
        }
        std::cout << '\n';
    }
}

*edited (4x, and definitively last time ;)
Last edited on
struct Row need to know the Matrix and row it refers to. It's easiest to move the struct inside Matrix.

Line 20: I think m_v{rows*cols} should be m_v(rows*cols)
Line 32: Remember your order of evaluation rules: rows+1 * cols+1 is evaluated as rows + (1*cols) + 1. I think you mean (rows+1) * (cols+1)

To ensure the validity of the test, populate the whole matrix before you print it. Otherwise you might be storing the values in the wrong place and then immediately printing the same value from the same wrong place.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <vector>
#include <iostream>



template <typename T>
class Matrix
{
    int m_row_size, m_col_size;
    std::vector<T> m_v;

    struct Row {
	Matrix<T> &mat;
	size_t row;
	Row(Matrix<T> &m, size_t r) :
	    mat(m), row(r) {}
	// here I don't know how to implement the stuff
	T & operator[]( int col ) { return mat.m_v[row*mat.m_col_size + col]; }
    };


public:
    Matrix( int rows, int cols)
	: m_row_size{rows}, m_col_size{cols}, m_v(rows * cols) // DMH
    {}

    Row operator[]( int row) { return Row(*this, row); }
};

int main()
{
    Matrix<int> matrix( 10, 10 );

    // Populate it
    for( int row = 0; row < 10; ++ row) {
        for( int col = 0; col < 10; ++ col ) {
            matrix[ row ][ col ] = (row+1) * (col+1); // note parens
        }

    }
    // Then print it
    for( int row = 0; row < 10; ++ row) {
        for( int col = 0; col < 10; ++ col ) {
            std::cout << matrix[ row ][col ] << ' ';
        }
        std::cout << '\n';
    }
}

Personally I think operator()(int,int) is the best looking wrapper for a 2d array. Or you can compose your code in a different way, so you could just have a function inside your Map_Manager called Tile& GetTile(int,int), which is better than Matrix<Tile>& GetMap(), which is typed as map.GetMap()[x][y]. it does look better if you stored the matrix in a local variable though and a matrix could be more robust if you give it more functionality, but in the end it is just aesthetics. You can also just manually type [y*width + x] everywhere, just avoid using std::vector[], try using std::vector.at() since they will catch bugs without debug flags.
Last edited on
.at() is notably slower when you do a lot of calls. You may want to abstract the access so you can swap from .at() to [] once it is debugged, or using the debug compiler flag, or the like. its a good idea to use it, though, for your first pass at least.

I also favor (int, int) over[][]. The [][] is kinda cool / polish but to get there is convoluted and does not really gain anything.
Last edited on
Thank you Dave.

I ask me if the overhead its being worth not sticking by a one-dimensional vector and calculating the dimensionality in the plain old way via offsets. Because the subscription would cost three method invocations each time (if I have it counted correctly).
such method invocations are generally optimized away (inlined).
what you want to avoid is doing math twice. Ideally your assembly result will just compute one offset one time and go there. If you make it too convoluted, it may end up doing that twice. The
type access(int r, int c) {return thing[r*thing.w+c];}
type function should be able to be done inline and one computation.
the compiler will replace the function call with
thing[r*thing.w+c]

Compiler optimizers are very smart now; I am still learning things they do that amaze me. Like anything else, you can try writing it a couple of ways and looking at the generated assembly to see if it is being smart or dumb. It could well be smart enough to unravel the [][] path into one inline call as well. You can't second guess when it will be smart or dumb; you just have to peek at it when you care. This is the CORE of a matrix library; if accessing elements is slow, its doomed from the first line, so it does matter here.
Last edited on
I am quite confident that the performance penalty of .at() is minor compared to the cost of reading memory. Especially when you put branch prediction into consideration.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <vector>
#include <chrono>

//timers.
typedef std::chrono::steady_clock::time_point TIMER_U;
typedef double TIMER_RESULT_U;

TIMER_U get_time(){
  return std::chrono::steady_clock::now();
}

template<size_t resolution = 1000>
TIMER_RESULT_U timer_delta(const TIMER_U &start, const TIMER_U &end)
{
  return std::chrono::duration_cast< std::chrono::duration<TIMER_RESULT_U, std::ratio<1, resolution>>>(end-start).count();
}

int main()
{

    std::vector<int> v;
    v.resize(1000000,1);

	int count = 0;
    TIMER_U t1 = get_time();
    for(int i = 0; i < v.size(); ++i)
    {
        count += v.at(i);
    }
    TIMER_U t2 = get_time();

    std::cout <<"time: "<<timer_delta(t1,t2)<<", i:"<<count<<std::endl;

    t1 = get_time();
    for(int i = 0; i < v.size(); ++i)
    {
        count += v[i];
    }
    t2 = get_time();

    std::cout <<"time: "<<timer_delta(t1,t2)<<", i:"<<count<<std::endl;

    return 0;
}


from godbolt web compiler
x64 clang -O2 -std=c++17
time: 0.000378, i:1000
time: 0.000248, i:2000

from mingw same settings on my machine, if I ran it a couple times, it seems to be the same speed and just varies.
time: 0.0033, i:1000
time: 0.00385, i:2000

same case with more, sometimes it ties, sometimes the other is faster.
with 1000000
time: 1.05532, i:1000000
time: 1.14991, i:2000000

*time is in milliseconds
*these aren't averaged, and not a very formal benchmark, I didn't do hot and cold tests, and it is probably bad practice to have all the work done in one executable.
Last edited on
That is almost 10% difference on the largest one. Its small in wall clock time, agreed.
The compiler might be able to optimize the at() call since it knows the index is in bounds. Also, you need to account for the time taken by the test itself.

To get around these, I put the actual at() and [] calls into functions in a separate compilation unit. I also included a function that just returns 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
#include <vector>

int doAt(std::vector<int> &v, size_t i) {
    return v.at(i);
}

int doBracket(std::vector<int> &v, size_t i) {
    return v[i];
}

int doNil(std::vector<int> &v, size_t i) {
    return 1;
}


Then I modified poteto's code to call these functions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream>
#include <vector>
#include <chrono>

//timers.
typedef std::chrono::steady_clock::time_point TIMER_U;
typedef double TIMER_RESULT_U;

TIMER_U get_time(){
  return std::chrono::steady_clock::now();
}

template<size_t resolution = 1000>
TIMER_RESULT_U timer_delta(const TIMER_U &start, const TIMER_U &end)
{
  return std::chrono::duration_cast< std::chrono::duration<TIMER_RESULT_U, std::ratio<1, resolution>>>(end-start).count();
}

extern int doAt(std::vector<int> &v, size_t i);
extern int doBracket(std::vector<int> &v, size_t i);
extern int doNil(std::vector<int> &v, size_t i);

int main()
{

    std::vector<int> v;
    v.resize(1000000,1);

	int count = 0;
    TIMER_U t1 = get_time();
    for(int i = 0; i < v.size(); ++i)
    {
        count += doAt(v,i);
    }
    TIMER_U t2 = get_time();

    std::cout <<"at() time: "<<timer_delta(t1,t2)<<", i:"<<count<<std::endl;

    t1 = get_time();
    for(int i = 0; i < v.size(); ++i)
    {
        count += doBracket(v,i);
    }
    t2 = get_time();

    std::cout <<"[] time: "<<timer_delta(t1,t2)<<", i:"<<count<<std::endl;

    t1 = get_time();
    for(int i = 0; i < v.size(); ++i)
    {
        count += doNil(v,i);
    }
    t2 = get_time();

    std::cout <<"nil time: "<<timer_delta(t1,t2)<<", i:"<<count<<std::endl;

    return 0;
}


Compiling with g++ -O4 and running several times, I get these times:
at() time: 1.6251, i:1000000
[] time: 1.4352, i:2000000
nil time: 1.2668, i:3000000

Subtracting the nil time gives the time difference for at() vs. []:
at() time: 0.3583
[] time: 0.1684

So it appears that at() is actually more than twice as slow.
at()
call    std::chrono::_V2::steady_clock::now()
        mov     rbx, rax
        movsxd  rsi, dword ptr [rsp + 8]
        mov     rax, qword ptr [rsp + 16]
        mov     rdx, qword ptr [rsp + 24]
        sub     rdx, rax
        sar     rdx, 2
        cmp     rdx, rsi
        jbe     .LBB1_8
        mov     r14d, dword ptr [rax + 4*rsi]
        call    std::chrono::_V2::steady_clock::now()

[]
call    std::chrono::_V2::steady_clock::now()
        mov     rbx, rax
        movsxd  rax, dword ptr [rsp + 8]
        mov     rcx, qword ptr [rsp + 16]
        mov     r14d, dword ptr [rcx + 4*rax]
        call    std::chrono::_V2::steady_clock::now()

You are correct, I have overlooked a compiler optimization.
I got the same results, it is twice as slow.

This also leads to the same conclusion that I have gotten with using optimized memory allocators, which is that the loss of safety you receive, over a malloc replacement isn't really that worth it since you will only see a speed up of like 2x, since it turns out that the system malloc is usually quite well optimized (unless you had a really effective allocator that moves the slowdown away from runtime like an arena or pool, but those can't really replace malloc in certain situations).
But overall, it is true that a 2x boost is significant, so if you were looping through an array with a million objects, then consider avoiding .at().
Plus note that on GCC, even with -g without sanitizers (I also think gdb helps finds OOB somehow, but that might have just been a reverse heisenbug) you will not get a crash like you would expect. Same for msvc with buffer checking (enable by default on debug profile). So I still prefer .at(), and if you were looping though an array, you would be using a range based for loop instead of .at() IF you could.
Last edited on
But overall, it is true that a 2x boost is significant, so if you were looping through an array with a million objects, then consider avoiding .at().

Matrix math code tends to spawn a bunch of intermediate steps and iteration over the data frequently. Its rarely one big M*N = a million, its more often much_smaller*much_smaller * 100 steps.

Still, now you are seeing what I was seeing with at() use. I still like the idea of using it to debug, though, which I have not typically done.
Last edited on
Topic archived. No new replies allowed.