As I understand it, you want a hashing function to use a string as a key in a hash table?
The one I am using works as follows:
- Loop through the string and take each character as an unsigned 8bit integer (aka uint8_t).
- binary OR each in reverse on a uint32_t shifted by position % 4 * 8 bit to the left.
- Every four characters take the part and add it to the sum, determined by the position % 3.
0: sum += part >> 2
1: sum ^= part << 4
2: sum |= part >> 1
- If any part is left when the string is finished (last part has less than 4 characters), add it to the sum shifted right by 4 bit.
The result is a nice uint32_t hash.
The generation of 100,000,000 random C-Strings resulted in the following:
------------+-------------+----------+-------------+----------+-------
Type | Unique rand | Quota | Unique Hash | Quota | Result
------------+-------------+----------+-------------+----------+-------
C-String | 100,000,000 | 100.00 % | 98,723,200 | 98.72 % | Passed
------------+-------------+----------+-------------+----------+------- |
When used in a chained hash table and an open hash table with 50,000 keys, each, the maximum number of entries in any chain was 3, and the maximum number of hops to find an empty place was 4. So clustering and secondary clustering seem to be alright.
But I have to admit that I limit open hash tables to a maximum load factor of 0.8, and my implementation uses a "Robin Hood insertion" approach.
Btw: I tried the "plotting" to find heavy clustering, but recording the maximum chain length / hops in a textfile is far more convenient.
Edit: I tried some "tried-and-true" algorithms, but for hashing strings they all went rather bad. However, if you want to hash integer numbers, I'd recommend the methods described by Thomas Wang and Robert Jenkins. Unfortunately Thomas's site is down (
http://www.concentric.net/~ttwang/tech/inthash.htm) but Roberts is still there. (
http://burtleburtle.net/bob/hash/integer.html)
A good read would be
http://burtleburtle.net/bob/hash/ btw.