Worrying about any of this is a huge waste of effort untill and unless you
profile your prototype application and determine that element access is a bottleneck.
a) Is there any overhead to 2D arrays? I imagine two indeces require more lookup time than a single index, but is that more costly than calculating a 1D index? I'm guessing it's not, but I can't think of a good way to measure the difference.
|
Difference between reading from a[r][c] and reading from a[r*COLS+c]? The machine code produced by the compiler is (usually) identical
Example
1 2 3 4 5 6 7 8
|
volatile int target1;
volatile int row = 100;
volatile int col = 15;
int main()
{
int a[1024][500];
target1 = a[row][col];
}
|
movl row(%rip), %eax
movl col(%rip), %edx
cltq
imulq $500, %rax, %rax
movslq %edx, %rdx
addq %rdx, %rax
movl -120(%rsp,%rax,4), %eax
movl %eax, target1(%rip) |
1 2 3 4 5 6 7 8
|
volatile int target2;
volatile int row = 100;
volatile int col = 15;
int main()
{
int b[1024*500];
target2 = b[row*500+col];
}
|
movl row(%rip), %eax
movl col(%rip), %edx
imull $500, %eax, %eax
addl %edx, %eax
cltq
movl -120(%rsp,%rax,4), %eax
movl %eax, target2(%rip) |
Typical C++ practice for a matrix class is to store a 1D data container (std::valarray is best for performance, followed by std::array/C array, std::vector, std::deque), and to perform the calculation (r*COLS+c) in the member access function, such as operator().
b) At this point, the distance data is symmetric (M[j] == M[j][i]), meaning I could halve my memory usage without loss of information. However, accessing M[j][i] would have to be translated into accessing M[i][j], most likely by a MIN/MAX function, which is quite heavy compared to a direct access. |
Calculating the index into a packed triangular matrix is indeed a bit complicated, but how heavy is "quite heavy"? Profile your solution. Then take an existing matrix library (boost.ublas, Eigen, etc), and profile their code.
c) The value of N is only known at run-time. I'm currently working with an upper bound on N so I can use static arrays. Is there a way for me to determine when the overhead of dynamic arrays becomes preferable to the wasted memory by having an n << N?
|
Define "overhead of dynamic arrays".
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
struct M {
int b[1024][500];
int operator()(int row, int col) const {
return b[row][col];
}
};
volatile int target;
volatile int row = 100;
volatile int col = 15;
int main()
{
M m;
target = m(row, col);
}
|
movl col(%rip), %edx
movl row(%rip), %eax
movslq %edx, %rdx
cltq
imulq $500, %rax, %rax
addq %rdx, %rax
movl -120(%rsp,%rax,4), %eax
movl %eax, target(%rip) |
vs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
#include <vector>
struct M {
std::vector<int> b;
M() : b(1024*500) {}
int operator()(int row, int col) const {
return b[row*500+col];
}
};
volatile int target;
volatile int row = 100;
volatile int col = 15;
int main()
{
M m;
target = m(row, col);
}
|
movl col(%rip), %ecx
movl row(%rip), %edx
imull $500, %edx, %edx
addl %ecx, %edx
testq %rax, %rax
movslq %edx, %rdx
movl (%rax,%rdx,4), %edx
movl %edx, target(%rip) |
d) Are there any "behind the scenes" problems I should worry about? Which are they, how do I find/measure them, and how do i solve them?
|
There are many behind the scenes things happening when accessing RAM: data alignment, page faults, cache misses, cache invalidation, false sharing, etc. Don't worry about them until and unless you've profiled a matrix library code, and your code, and were not satisfied with either.
Yes, if I use 512 instead of 500 for the number of columns in my examples above, the index calculation becomes faster, but, as the wiki rightfully notes, this gain gets you nothing if you run out of cache.