Hi guys,
I was wondering which way was the fastest for counting the bits set in a 64-bit longlongint
I have been developing a chess engine lately and for performance reasons I need an extremely fast popcount method. I know it has been discussed endlessly here and on StackExchange, but I was unable to find a conclusive answer. Right now I am using: __builtin_popcountll(long)
I am on Windows and using MinGW with Dev-C++.
#include <bitset>
#include <type_traits>
#include <limits>
#include <iostream>
#include <ctime>
#include <iomanip>
template < typename T > std::size_t popcnt( const T& v )
{
static_assert( std::is_integral<T>::value && std::is_unsigned<T>::value, "an unsigned integral type expected" ) ;
return std::bitset< std::numeric_limits<T>::digits >(v).count() ;
}
int main()
{
constint NVALUES = 1'000'000'000 ;
unsigned int r = 0 ;
const auto start = std::clock() ;
for( unsigned long long n = 0 ; n < NVALUES ; ++n ) r += popcnt(n) ;
const auto end = std::clock() ;
const unsigned long long popcnts_per_sec = NVALUES / double(end-start) * CLOCKS_PER_SEC ;
std::cout << "about " << std::fixed << std::setprecision(0) << popcnts_per_sec << " popcnts per second, "
<< std::setprecision(3) << 1'000'000'000.0 / popcnts_per_sec << " nanoseconds per popcnt\n\n" ;
return r ;
}
[ 0.000000] Detected 2992.516 MHz processor.
LLVM
------
with hardware support for popcnt
about 684931506 popcnts per second, 1.460 nanoseconds per popcnt
without hardware support for popcnt
about 306748466 popcnts per second, 3.260 nanoseconds per popcnt
GNU
------
with hardware support for popcnt
about 657894736 popcnts per second, 1.520 nanoseconds per popcnt
without hardware support for popcnt
about 207900207 popcnts per second, 4.810 nanoseconds per popcnt
I see. I'm on Windows and right now I have no intention to port the engine to other operating systems. If I was, I would have to use your suggestion.
Thank you!