Efficient data storage.

My project requires a large distance matrix as data, which is accessed very frequently throughout the project. Due to its importance I'm a bit worried about efficiency; any overhead on a single access will quickly ramp up to major slowdowns throughout the program. Thus, my question will be how to find a good way to store it.

Due to some design choices, the matrix is currently stored in a class, which is instantiated once at the point of read-in. After that, every class that needs access to it (read: every class) will contain a reference to that data object. The class itself is at this point a POD struct: a few ints, an array of ints size N, and an NxN array of ints.

My questions:
a) Is there any overhead to 2D arrays? I imagine two indeces require more lookup time than a single index, but is that more costly than calculating a 1D index? I'm guessing it's not, but I can't think of a good way to measure the difference.
b) At this point, the distance data is symmetric (M[j] == M[j][i]), meaning I could halve my memory usage without loss of information. However, accessing M[j][i] would have to be translated into accessing M[i][j], most likely by a MIN/MAX function, which is quite heavy compared to a direct access. Is there an efficient way to calculate the 1D index of a 'diagonal-half-matrix' access?
c) The value of N is only known at run-time. I'm currently working with an upper bound on N so I can use static arrays. Is there a way for me to determine when the overhead of dynamic arrays becomes preferable to the wasted memory by having an n << N?
d) Are there any "behind the scenes" problems I should worry about? Which are they, how do I find/measure them, and how do i solve them? (This question is fueled by the very confusing paragraph I found here: http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Code_optimization/Faster_operations#Array_cells_size)

Any other pieces of advice are always welcome; data management is going to be a frequent issue in this project and I'd like to get as much insights as possible.

Please do mind that I'm not very interested in "it's bad practice!" answers. However, if you can tell me [i]why
it's bad practice (pro's and con's) so I can decide for myself, I'd very much love to hear it.
For some reason I can't edit my post, but this:
M[j] == M[j][i]

was messed up due to the italic code [ i]. It must be:
M[1][2] == M[2][1]

as in the indeces are reversed.
I would suggest storing 1/2 the matrix as a single contiguous block.

There is a formula that allows you to extract an element from the row and column indices

http://www.codeguru.com/cpp/cpp/algorithms/general/article.php/c11211/TIP-Half-Size-Triangular-Matrix.htm

An advantage of a solid block of data is "cache coherence"

http://cplusplus.com/forum/beginner/58927/

I think this may be more important than the time spent on the arithmetic but obviously you would have to test to find out


That's the formula I came up with myself, but the sheer amount of calculations for a single element access is... painful. I think I'll stick to a doubled matrix for now.

An NxN 2D array is still a solid block of data, isn't it?
Yes it is.

I'm not sure the speed of a static array is faster than a dynamic array.

If the dynamic array is contiguous ( vector<float>(n*n) ) then I think it should be the same.
Worrying about any of this is a huge waste of effort untill and unless you profile your prototype application and determine that element access is a bottleneck.

a) Is there any overhead to 2D arrays? I imagine two indeces require more lookup time than a single index, but is that more costly than calculating a 1D index? I'm guessing it's not, but I can't think of a good way to measure the difference.

Difference between reading from a[r][c] and reading from a[r*COLS+c]? The machine code produced by the compiler is (usually) identical

Example
1
2
3
4
5
6
7
8
volatile int target1;
volatile int row = 100;
volatile int col = 15;
int main()
{
    int a[1024][500];
    target1 = a[row][col];
}
        movl    row(%rip), %eax
        movl    col(%rip), %edx
        cltq
        imulq   $500, %rax, %rax
        movslq  %edx, %rdx
        addq    %rdx, %rax
        movl    -120(%rsp,%rax,4), %eax
        movl    %eax, target1(%rip)


1
2
3
4
5
6
7
8
volatile int target2;
volatile int row = 100;
volatile int col = 15;
int main()
{
    int b[1024*500];
    target2 = b[row*500+col];
}
        movl    row(%rip), %eax
        movl    col(%rip), %edx
        imull   $500, %eax, %eax
        addl    %edx, %eax
        cltq
        movl    -120(%rsp,%rax,4), %eax
        movl    %eax, target2(%rip)


Typical C++ practice for a matrix class is to store a 1D data container (std::valarray is best for performance, followed by std::array/C array, std::vector, std::deque), and to perform the calculation (r*COLS+c) in the member access function, such as operator().

b) At this point, the distance data is symmetric (M[j] == M[j][i]), meaning I could halve my memory usage without loss of information. However, accessing M[j][i] would have to be translated into accessing M[i][j], most likely by a MIN/MAX function, which is quite heavy compared to a direct access.

Calculating the index into a packed triangular matrix is indeed a bit complicated, but how heavy is "quite heavy"? Profile your solution. Then take an existing matrix library (boost.ublas, Eigen, etc), and profile their code.

c) The value of N is only known at run-time. I'm currently working with an upper bound on N so I can use static arrays. Is there a way for me to determine when the overhead of dynamic arrays becomes preferable to the wasted memory by having an n << N?

Define "overhead of dynamic arrays".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct M {
    int b[1024][500];
    int operator()(int row, int col) const {
            return b[row][col];
    }
};
volatile int target;
volatile int row = 100;
volatile int col = 15;
int main()
{
    M m;
    target = m(row, col);
}
        movl    col(%rip), %edx
        movl    row(%rip), %eax
        movslq  %edx, %rdx
        cltq
        imulq   $500, %rax, %rax
        addq    %rdx, %rax
        movl    -120(%rsp,%rax,4), %eax
        movl    %eax, target(%rip)


vs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <vector>
struct M {
    std::vector<int> b;
    M() : b(1024*500) {}
    int operator()(int row, int col) const {
            return b[row*500+col];
    }
};
volatile int target;
volatile int row = 100;
volatile int col = 15;
int main()
{
    M m;
    target = m(row, col);
}
        movl    col(%rip), %ecx
        movl    row(%rip), %edx
        imull   $500, %edx, %edx
        addl    %ecx, %edx
        testq   %rax, %rax
        movslq  %edx, %rdx
        movl    (%rax,%rdx,4), %edx
        movl    %edx, target(%rip)


d) Are there any "behind the scenes" problems I should worry about? Which are they, how do I find/measure them, and how do i solve them?
There are many behind the scenes things happening when accessing RAM: data alignment, page faults, cache misses, cache invalidation, false sharing, etc. Don't worry about them until and unless you've profiled a matrix library code, and your code, and were not satisfied with either.

(This question is fueled by the very confusing paragraph I found here: http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Code_optimization/Faster_operations#Array_cells_size)

Yes, if I use 512 instead of 500 for the number of columns in my examples above, the index calculation becomes faster, but, as the wiki rightfully notes, this gain gets you nothing if you run out of cache.
Last edited on
Thanks for that, Cubbi. I think I got caught up in premature-optimization paranoia.

One question: you say std::valarray has better performance than a C-array. How exactly is that even possible? Isn't it by definition a C-array with sugar on top?
valarrays have special arrangements with the (good) C++ compilers. Using them is like using C arrays with C's restrict keyword.

ยง26.6.2.4[valarray.access]/4
the expression &a[i] != &b[j] evaluates as true for any two arrays a and b and for any
size_t i and size_t j such that i is less than the length of a and j is less than the length of b. This
property indicates an absence of aliasing and may be used to advantage by optimizing compilers


PS: Most matrix libraries today use expression templates instead of valarrays (rightfully, too), but check out http://www.zdnet.com/videos/parallelism/valarray/295695 for something modern and practical
Last edited on
I'm afraid you went beyond my level of understanding there, so I'll just take your word for it. The valarrays solved some other problems already, so I'll give them a try. I've changed my bound to N to a bound to log2(N) to ensure I'll always use powers of 2 as size, so the accessor becomes cheaper.

Small unrelated question:
In one of my other topics I mentioned my problem of 'unglobalizing' my data. I went and did it by grouping all my data in a single POD struct, of which one object is created at runtime, and giving all other classes a reference to this object. The problem is that a reference must be initialized in the constructor, meaning I can't have an "empty" constructor. As a result, I couldn't dynamically allocate arrays of objects of any class that has the reference. I imagine I'll run into the same problem with valarrays of such objects, so how do I get around it?
did it by grouping all my data in a single POD struct, of which one object is created at runtime, and giving all other classes a reference to this object. The problem is that a reference must be initialized in the constructor, meaning I can't have an "empty" constructor. As a result, I couldn't dynamically allocate arrays of objects of any class that has the reference


A class holding a reference as a non-static member is non-default-constructible and non-assignable: it can't be used in any container directly (you can construct a vector since vectors don't require default-constructible to construct, but it won't be very useful since resize and other operations require the assignable property). Consider holding a pointer or a reference_wrapper.
Guess it's back to making it a pointer (6th time I changed my mind in the last few hours).

Big thanks to Cubbi and mik2718. I'm going to mark this topic as solved, but further input on the subject is always welcome!
Topic archived. No new replies allowed.