Casting Strings as Ints

Hey,
So I need to hash and my key is a string. What would you say would be the best way to convert my key to an int for my hash function?
Thanks

Last edited on

Handy Andy (5051)

Hello yem,

That would depend on what the string looks like. Give an example of the string.

Andy

jonnin (11340)

Personally I like things that "work like" security hashes. The main theme behind them is that two similar keys, eg "janedoe1234" and "janedoe1235" are radically different from each other to the point that there is no way you can possibly see that from the hash that the input was similar.

This can be done in a variety of ways but keep it simple enough to be efficient and complex enough to get the job done.

If the keys are different enough from each other to begin with it is sometimes OK to just take 64 bits of the input data and cast it as integer, yes. This rarely works in real life, but when it does, be happy and use it if you want to do so. Note that 64 bit is ONLY 8 letters in ascii and less in unicode, so that makes it hard to use this in most normal data!

one thing I have found to work well with minimal effort:
convert the key to a number (however you do this) and then use <random> to generate the true key by seeding it with the number you cooked up. This does what I was saying-- it makes small changes in the keys have very different outputs, and its easy to do. And its assured that if you reseed the generator with the same input, the next 'random' value will be what it was last time, so it works out. This will not cover flaws where 2 inputs have the same output in your original cooked up number, but it WILL cover the flaws where, in my jane doe example, you got like 9876 and 9875 as your output (similar, small difference) as the random back end will scramble that away.

Last edited on

JLBorges (13770)

> What would you say would be the best way to convert my key to an int for my hash function?

The library has a speialisation of std::hash for strings. Use it.
https://en.cppreference.com/w/cpp/string/basic_string/hash

yem (51)

Hey guys,

Handy Andy, the string is supposed to be a discussion topic

jonnin, I might try that

JLBorges I'm going to take a look at that.

I'll let you guys know once I get it worked out!
Thanks guys!

yem (51)

JLBorges

It seems that https://en.cppreference.com/w/cpp/string/basic_string/hash shows me how to use a library function to hash a string (unless I'm not understanding it correctly).

For the purpose of my assignment I have to actually write out the function, using a library function that does the same thing won't cut it.

Am I misunderstanding it?

Thanks

seeplus (6460)

As you have to code your own hash function for a std::string, has any algorithm been discussed/provided. It's very easy to produce a hash function, but it won't be a 'good one'. What is the assignment expecting?

Also https://stackoverflow.com/questions/8317508/hash-function-for-a-string

JLBorges (13770)

> For the purpose of my assignment I have to actually write out the function,
> using a library function that does the same thing won't cut it.

One of the most common hashing algorithms for strings builds a hash value by adding each byte of the string to a multiple of the hash so far. The multiplication spreads bits from the new byte through the value so far; at the end of the loop, the result should be a thorough mixing of the input bytes. Empirically, the values 31 and 37 have proven to be good choices for the multiplier in a hash function for ASCII strings.

- Kernighan and Pike in 'The Practice of Programming'

Their C code:

enum { MULTIPLIER = 31 };

/* hash: compute hash value of  string */
unsigned int hash (char* str)
{
    unsigned int h;
    unsigned char* p:
    
    h = 0:
    for (p = (unsigned char*) str; *p != '\0'; p++) h = MULTIPLIER * h + *p;
    
    return h % NHASH;
}

They point out:

The calculation uses unsigned characters because whether char is signed is not specified by C and C++, and we want the hash value to remain positive.

The hash function returns the result modulo the size of the array. If the hash function distributes key values uniformly, the precise array size doesn't matter. It's hard to be certain that a hash function is dependable, though, and even the best function may have trouble with some input sets, so it's wise to make the array size a prime number to give a bit of extra insurance by guaranteeing that the array size, the hash multiplier, and likely data values have no common factor.

Experiments show that for a wide variety of strings it's hard to construct a hash function that does appreciably better than the one above, but it's easy to make one that does worse.

Last edited on

yem (51)

seeplus,

Well we learned different hash functions for ints in Data Structures II but for this assignment no algorithm has been discussed/provided. We were just told to write a hash function, they didn't specify further.

yem (51)

JLBorges

I like that algorithm. It looks good. I'm going to try that.

Thanks

yem (51)

I got it working! Thanks guys!

seeplus (6460)

Or as C++:

unsigned int hash(const std::string& str, size_t mod)
{
	constexpr unsigned int MULTIPLIER {31};
	unsigned int h {};

	for (char c : str)
		h = MULTIPLIER * h + c;

	return h % mod;
}

Topic archived. No new replies allowed.

C++

Forum

Casting Strings as Ints