I know i can find notes and tutorials on hash functions and hash tables on Google. However all the notes and tutorials i have found are not very clear. They all use words like "collision" and other jargon.
I also understood from the notes i have that i need a pseudo random number generator. That also is unclear.
Does anyone have notes, tutorials or code snippet that one can use as an introduction to hash functions and hash tables.
A hash function is merely a function that takes an argument and produces a hopefully unique number for that argument.
As an example, if you wanted to hash a std::string, you could add together all of the values of all the characters in the
string and return that. The problem with that hash function is that the strings "Hello World" and "World Hello" would
return the same value.
A hash table is a map which maps the hopefully unique number to the actual object that produced the value. When I
want to find the location of a given object in the hash table, all I need to do is run the hash function on the object to
generate the unique number, then check to see if the map contains that number as one of the keys.
The problem with a hash table is: what if two distinct objects return the same hash value (unique number), such as
in my example? This is what is called a "collision". If I attempt to insert both objects into the hash table, the second
one will overwrite the first.
There are generally three solutions to that problem:
1) Each entry in the hash table actually contains a list of elements that hashed to that value (this is called "chaining");
2) If a collision occurs, instead increment the unique number produced by the second object by one, then retry the insert. If that collides to, increment it again, and so forth until insertion succeeds
3) Make a better hash function that is unlikely to produce collisions and hope that one never occurs
#include <map>
#include <numeric> // For std::accumulate
#include <stdint.h> // For uint32_t
#include <set>
#include <string>
// This hash function adds together the values of each character in the string
uint32_t hash_fn( const std::string& s ) {
return std::accumulate( s.begin(), s.end(), 0U );
}
typedef std::map< uint32_t, std::string > HashTable;
// This hash insert function returns false if a collision occurs and does not actually insert the new element
bool hash_insert( HashTable& table, const std::string& str ) {
return table.insert( std::make_pair( hash_fn( str ), str ) ).first;
}
// This hash insert function keeps incrementing by one until the insert succeeds
void hash_insert_increment( HashTable& table, const std::string& str ) {
uint32_t hash = hash_fn( str );
while( !table.insert( std::make_pair( hash, str ) ).first )
++hash;
}
// This hash function resolves by chaining. Note we need a new hash table data struture
typedef std::map< uint32_t, std::set< std::string > > ChainedHashTable;
void hash_insert_chaining( ChainedHashTable& table, const std::string& str ) {
// Whether this insert succeeds for fails is irrelevant; either way str is in the hash table
table[ hash_fn( str ) ].insert( str );
}
For a general introduction on how hash tables work, see http://en.wikipedia.org/wiki/Hash_table. The basic idea is to develop a "hash function" that for each input value will generate a concise output value that can be used as an array index to identify the original value.