Ok, have you ever heard of hashing? I presume that if you think on this problem for another day or two, you'll reinvent a hashmap. You are on a good way. ;)
Actually the optimal way to do this grouping is to hash to n buckets, with n set so that the whole hashmap fits in the L1 data cache. So probably n < 1000 is a good solution. Then, hash each bucket, this time with counting - such two phase algorithm could group over a million of distinct elements of any type with almost sequential access to RAM and most operations on data in L1.
If more elements are needed, add another phase - with 3 phase grouping, you could get over 1 billion elements.
I'll post code for this soon.
I think the main reason our solution is slow is because as the range of the numbers is getting large, the underlying tree in the map is getting deeper and deeper, thus searching for a key takes more time. |
This is the cause of a slowdown of the version with std::map. But my version uses hashmap, and hashmap lookups are O(const), so the whole algorithm is linear if memory had the same speed regardless of required size.
The slowdown in my version is caused by:
1. resizing the hashmap - rehashing - this is costly, and this need not be done in a tree map; giving a size hint we could avoid this, but you must know the approximate number of distinct items in advance and this is not always the case. Anyway it is just a constant factor.
2. poor memory locality because of random data in the array - small hashmaps are fast because they fit in the L1 cache; when crossing the L2 cache size, every lookup means a cache miss = many, many CPU cycles just to wait for memory.