Putting output of the bitset member funtion count into a vector is slow

Here is a part of my C++ code where I have problems:

std::bitset<64>a;
std::bitset<64>b;
std::bitset<64>c;
int bit_count=0;
std::vector<int> vec(SIZE,0);
for (i=1;i<NUM;i++)
{
%% I do here some operations on a and b (a and b will have bits that are set)
c=a^b;
bit_count=(int) c.count(); %%%% LINE !
vec[i]=bit_count; %%%% LINE2 2
}

My problem is the following:
- (i) if I comment LINE 1 and LINE 2 the code runs in approx. 109ms; (ii) if I only comment LINE2 the code runs approx. in 115 ms; (iii) if I comment LINE 1 and bit_count=0 the code runs approx 130 ms; (iv) if both lines (LINE 1 and 2) are not commented, the code runs in approx. 350 ms.

Why is the code slow when I use LINE1 and LINE2? I do not find any acceptable explanation.

Note that I also tried vec,push_back(bit_count), and it is also slow.
casting reduces the performance . and also putting the value in the vector is also reduces performance .

correct me if i am wrong . !!!
This kind of benchmarking is not easy to get right. What likely happens when you comment out LINE2 is that, without *use* for bit_count, the compiler does not assign to it, and does not evaluate c.count() (except maybe for the last c.count() called in the loop).

For example, on my system, with both LINE1 and LINE2, the code produced from the following loop:

1
2
3
4
5
6
7
8
    for (int i=1;i<NUM;i++)
    {
        a[std::rand()%64] = true;
        b[std::rand()%64] = true;
        c=a^b;
        bit_count=(int) c.count(); // line 1
        vec[i]=bit_count; // line 2
    }


is

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.L12:
        movq    %rdx, (%rsp)
        call    rand
        movl    %eax, %ecx
        movl    $1, %eax
        andl    $63, %ecx
        salq    %cl, %rax
        orq     %rax, 8(%rsp)
        call    rand
        movq    (%rsp), %rdx
        movl    %eax, %ecx
        movl    $1, %eax
        andl    $63, %ecx
        salq    %cl, %rax
        orq     %rax, %rdx
        movq    8(%rsp), %rax
        xorq    %rdx, %rax
        popcntq %rax, %rax
        movl    %eax, 0(%r13,%rbx)
        addq    $4, %rbx
        cmpq    $400000000, %rbx // that's how big my vector was in the test
        jne     .L12


but with LINE2 commented out,

1
2
3
4
5
6
7
8
    for (int i=1;i<NUM;i++)
    {
        a[std::rand()%64] = true;
        b[std::rand()%64] = true;
        c=a^b;
        bit_count=(int) c.count(); // line 1
//        vec[i]=bit_count; // line 2
    }


it is only

1
2
3
4
5
.L12:
        call    rand
        call    rand
        subl    $1, %ebx
        jne     .L12


Always examine the assembly output when micro-optimzing to learn what really happens.


One way to disable this optimization is to declare bit_count as volatile int bit_count; -- the compiler is not allowed to optimize out any read or write accesses to volatile objects. With that change, on my system, commenting out LINE2 makes no difference whatsoever (the loop iteration takes about 22 nanoseconds either way), but commenting out LINE1 makes the loop run noticably faster (takes about 19 nanoseconds per iteration)
Last edited on
Sorry just to reply now.

Thank you very much for your help. It is true. The problem is that I used the optimization flag during
the compilation. In this case, if the compiler notices that only the assignment in the last "for loop" iteration is used, then it only does one assignment corresponding to the last operation.

In my case, the operation
bit_count=(int) c.count();
is only performed for the last iteration if
vec[i]=bit_count;
is commented.

If I do not use the optimization flag, the run times are the same.
Topic archived. No new replies allowed.