Actually it calculates an inner product on each character adding it to the last result (see http://www.cplusplus.com/reference/numeric/inner_product/).
65437 is the nearest prime to 2^16. In fact I never need the full range of 16 bit, therefore 65437 different hashes are enough.
Empirically it seems - on unique strings - to be collision free. But if not, can somebody construct me an example where not.
This is the classic problem of trying to fit too many things into a fixed number of slots. You cannot represent every possible string with just a single 16-bit integer.
The absolute best case scenario is 2^16 unique strings before you have a collision.
To reproduce a collision, just hash (2^16)+1 strings. At least 2 of them will collide (though you're likely to have a collision much sooner):
1 2 3 4 5 6
for(int i = 0; i < 0x10001; ++i)
{
stringstream ss;
ss << i;
hash( ss.str() );
}
EDIT:
Also, given the simplicity of your algorithm, I can tell that "ab" and "ba" will collide. So there's a test case for you.
Ups, you are true :-[ 2 Byte string already start to collide.
You're also true with the fact "There is no such thing as a collision free hash.", I was just too lax in describing my problem, wanted to say "almost" collision free.
The absolute best case scenario is 2^16 unique strings before you have a collision.
This condition is ensured. But:
In your example the pitfall is the commutativity of the plus and multiply operator. That pitfall I encountered in my first try, but I openly admit, I didn't used that small strings to test.
But minus falls out because it can result in negative numbers and division can result in real numbers.
Any ideas how I can get at least "less" collisions?