Hashtables: Any optimization when object

Forum

Forum
General C++ Programming
Hashtables: Any optimization when object

Hashtables: Any optimization when objects are not deleted ?

I recently read about perfect hashtables which guarentee O(1) worst case time. But I was crushed on reading that it requires that all the objects be known in advance.

Is there some optimizationn which gives better performance (worst case, or amortized, or probabilistic) if it is known that objects once stored into the hastable are never deleted ? We do NOT know the final number of objects that need to be stored.

mtrenkmann (22)

I think if you do not know the number of objects to be stored, there is no optimization. GCCs implementation std::tr1::unordered_map<key, value> uses a default bucket count of 10 in its default c'tor. If then the loading factor reaches a certain value the entire hashtable will be resized and rehashed, which requires one memory allocation and many hash function invocations. To avoid this you have to know the number of objects to initialize your table appropriately...

Perfect hashtables are a bit different. They ensure that no hash collisions occur, which is only possible if they know the complete key set prior. Then a special tuned hash function will be build from that key set. In your case perfect hash functions are definitely impossible.

The important question is when do you need best performance, for insert or lookup operations? If the latter one use std::tr1::unordered_map<key, value>, otherwise use the tree-based std::map<key, value>.

DexterMorgan (107)

Thanks.
How does

 std::tr1::unordered_map<key, value>

differ from

std::map<key, value>

? I have only used the latter.

closed account (oz10RXSz)

The latter gives you O(log n) access time, while the first, non-standard one gives you O(1) amortized access time. There are some mixed tree-like hashtables which give deterministic access time by slightly sacrifying the average access time performance (it is indeed O(log n), but with a high logarithm base like 64 - so in practice - constant). But they do not need to rehash, so this is a huge win in real time systems.

Last edited on

DexterMorgan (107)

xoreaxeax, is the 64 base you mentioned for std::map<key, value> ?

closed account (oz10RXSz)

No. Javolution's FastMap uses it, but I do not know of any C++ libs that do.

Last edited on

mtrenkmann (22)

@DexterMorgan The two classes differ in the internal data structure they use.

std::map<key, value> uses a kind of http://en.wikipedia.org/wiki/Balanced_tree.
std::tr1::unordered_map<key, value> is backed by an array and uses a hash function for addressing.

Another difference is that key-value-pairs in std::map stay sorted by its key type applying operator<() by default. If your key type is std::string for instance and you iterate the map, all key-value-pairs comes in lexicographical order. In the case of std::tr1::unordered_map<key, value> key-value-pairs are randomly distributed over an array (depending on the hash function), so if you iterate it, all values come in random order too.

Topic archived. No new replies allowed.