Array Performance issue

Below code runs at different speed, when array is of huge size.
Which one of these is faster?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

struct B {
    int i[32];
};

int Sample1(B* array, int Sizearray) {
    int sum = 0;
    for (int a = 0; a<Sizearray; ++a) {
        for(int b =0; b<32; ++b) {
             sum += array[a].i[b];
        }
    }
    return sum;
}


int Sample2(B* array, int Sizearray) {
    int sum = 0;
    for (int b=0; b<32; ++b) {
        for (int a =0; a<Sizearray; ++a) {
            sum += array[a].i[b];
         }
    }
    return sum;
}
Probably the first one, due to cache performance with the array i[] being in contiguous memory.

Any of:
- using iterators;
- calling std:accumulate()
- using threads or OpenMP
would probably speed it up further.
Last edited on
@OP
Which one of these is faster?

Why not time them?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <iostream>
#include <ctime>
using namespace std;

// denver2020 code  *****************

struct B {
    int i[32];
};

int Sample1(B* array, int Sizearray) {
    int sum = 0;
    for (int a = 0; a<Sizearray; ++a) {
        for(int b =0; b<32; ++b) {
             sum += array[a].i[b];
        }
    }
    return sum;
}


int Sample2(B* array, int Sizearray) {
    int sum = 0;
    for (int b=0; b<32; ++b) {
        for (int a =0; a<Sizearray; ++a) {
            sum += array[a].i[b];
         }
    }
    return sum;
}

// end of denver2020 code  *****************


void timeIt( int fn( B*, int ), B* A, int n, int &sum, double &t )
{
   clock_t tstart = clock();
   sum = fn( A, n );
   clock_t tfinish = clock();
   t = double( tfinish - tstart ) / CLOCKS_PER_SEC;
}


int main()
{
   const int SIZE = 100000;
   B *A = new B[SIZE];
   for ( int r = 0; r < SIZE; r++ )
   {
      for ( int c = 0; c < 32; c++ ) A[r].i[c] = c;
   }

   int sum1, sum2;
   double time1, time2;
   timeIt( Sample1, A, SIZE, sum1, time1 );
   timeIt( Sample2, A, SIZE, sum2, time2 );
   
   cout << "Sample1: "<< sum1 << " in time " << time1 << " s\n";
   cout << "Sample2: "<< sum2 << " in time " << time2 << " s\n";
   delete [] A;
}



Sample1: 49600000 in time 0.001931 s
Sample2: 49600000 in time 0.027526 s


Sample1 is more than 10 times faster.
Last edited on
Thanks lastchance, for your quick clarification.
you can probably unroll it to a single loop instead of a nested one and gain a microscopic bit of speed by exploiting the simple nature of the struct and c++ memory alignment. But if this is an example from a bigger problem, where the struct has more data, that would not work: it works because the struct is basically nonexistent, making it (in memory) just a 2-d array (this may also depend on struct alignment, but you can toggle that around the loops if it is an issue).
Topic archived. No new replies allowed.