I have a type with 2 int data members. I want to calculate a GOOD hash value and since std::hash exists I thought I would use it - after all it must be GOOD to be in the std.
However there is no hash_combine and no clues as to how to combine the results to get a hash value that is GOOD for my type.
I need a perfect hash - if it isn't perfect then it won't work.
I require that hash(a) == hash(b) only when a == b
I am using a cache (a map) where the key is the hash value. Every time I get a point I get its hash and test for presence in the cache. If I have a new point with a hash value that has already been inserted in the cache, then this new point will not be added to the cache. I cannot have this behavior cause it looses Points!
I think the key would have to be the Point itself, seems thats the only perfect way to ensure all Points are taken into the cache and none is skipped.
Regretably, I cannot use the perfect hash from @mbozzi because sizeof(size_t) == 4 in my platform.
I need a perfect hash - if it isn't perfect then it won't work.
I require that hash(a) == hash(b) only when a == b
Perfect hashes have this property, but the hash function is pre-computed by seeding a PRNG such that for all i, j where 0 <= i < n and 0 <= j < n, (input[i] = input[j] <=> hash(input[i]) = hash(input[j])) and 0 <= hash(x) < n. The trade-off is that for values not contained in input, the hash function necessarily produces a collision, so a perfect hash function can't be used as an std::unordered_set replacement, because insertion takes O(n).
I think the key would have to be the Point itself, seems thats the only perfect way to ensure all Points are taken into the cache and none is skipped.
Then you don't want a hash table. You want an std::map. Otherwise you have to deal with a gigantic array that's mostly empty.
perfect hashing is usually reserved for a static set of known data such that you can research it and ensure the result.
you can TRY using GUID type algorithms but your table cannot be large enough to use a full blown GUID (these tend to be 128 bits or something quite large, you can't make a 2^128 table on most normal computers).
and then you get into 'discrete math' gibberish that boils down to an obvious fact: the fewer # of slots in your table, the higher the probability of a collision. A good hash algorithm should be 'perfect' when a table is 4-5 times the size of the number of items being put into it, but you are in vegas on this (playing the odds without an assured result) and it won't work every time, eventually the input data will collide and you will have some sort of bug.
Bottom line is this... perfect hashing only works for a very small amount of data in the universe; if your data has a unique 'key' that can be converted to a unique array index in an array that is small enough to be useful on the current hardware, then you can do it. Eg if you have a tiny company and your customer IDS are from 0 to 1000, yes, you can hash that into an array of [1001] perfectly. And given modern computer power, you can even do that up to a few million without sweating too hard. When your table starts getting into the GB range (this is about 50 million slots assuming a 64 bit int key and a 64 bit pointer to data item make up the table) its starting to stretch a pretty good PC as of 'today' -- and a lot of users would be annoyed at one program burning up over a gb for more than a few moments.