Which would be faster?

I want to execute either of the following statements hundreds of millions of times.

1
2
3
4
5
6
7
8
9
10
11
//Would this be faster...
vector< vector< vector<char> > > aa;

char temp = aa[x][y][z];    //Many many times

//than this?
map<string,char> aa;
map<string,char>::iterator it; 

it = aa.find("xyz");    //Many many times


Note: the vector and map containers have the same amount of elements - 64 to be exact. Also, the variables x, y, and z (as used with the map) can only have the value of 'A', 'T', 'G', or 'C'. It might already be apparent, but when used with the 3D vector they can only have a value of 0-3.
Maps use (red black?) trees internally, which have log(n) search speed. Vectors on the other hand have constant access speed, so it doesn't matter how many values are in each vector, the lookup will still be virtually instant. A better comparison would be your vector approach vs an "unordered_map", which in the new standard implements the map with a hash table internally, which I believe also gives constant speed lookups.
map::find() will perform log2(64) string comparisons, each of which reads at most three characters from two strings it is comparing

aa[x][y][z] will perform three indexed pointer dereferences

Of the two, vector of vectors of vectors will be the faster (do measure yourself, though), although like the map, it suffers from the lack of locality. Consider a 1D container.



To be specific, here's how a vector-of-vectors-of-vectors loses to a 1D vector when compiled with gcc 4.6.2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::vector<std::vector<std::vector<char>>> aa
...
temp = aa[x][y][z];
    movl    z(%rip), %ecx // read from z
    movl    y(%rip), %eax // read from y
    movl    x(%rip), %edx // read from x
    movq    aa(%rip), %rsi // read from aa
    cltq
    leaq    (%rax,%rax,2), %rax 
    movslq  %edx, %rdx 
    leaq    (%rdx,%rdx,2), %rdx 
    leaq    (%rsi,%rdx,8), %rdx  
    movq    (%rdx), %rdi       // read from RAM #1 (aa[x])
    movslq  %ecx, %rdx
    leaq    (%rdi,%rax,8), %rax
    movq    (%rax), %rax       // read from RAM #2 (...[y])
    movzbl  (%rax,%rdx), %eax  // read from RAM #3 (...[z])
    movb    %al, temp(%rip)

(note that the three reads are from completely unrelated parts of RAM, since individual vectors were allocated wherever operator new placed them. This means each access may be a cache miss)

vs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::vector<int> aa(DIM1*DIM2*DIM3);
...
temp = aa[x*DIM2*DIM3 + y*DIM3 + z];
        movl    x(%rip), %eax // read from x
        movl    DIM2(%rip), %r8d
        movl    DIM3(%rip), %edi
        movl    y(%rip), %edx // read from y
        movl    DIM3(%rip), %esi
        imull   %r8d, %eax
        movl    z(%rip), %ecx // read from z
        imull   %esi, %edx
        imull   %edi, %eax
        addl    %edx, %eax
        addl    %ecx, %eax
        movq    aa(%rip), %rcx // read from aa
        cltq
        movl    (%rcx,%rax,4), %eax  // read from RAM
        movb    %al, temp(%rip)
Last edited on
Hey! That's a really neat trick, Cubbi! I understood why it's better to use a 1D container but I didn't see how I would convert the x, y, and z numbers into an index. It never occurred to me to think of the concatenated number as a base-four number!
e.g. 2134 = 3910

Many thanks! I know that'll come in handy.
Topic archived. No new replies allowed.