HashMap implementation

Feb 3, 2012 at 5:59pm
Hi:

Following the interface of unordered_map, I'm trying to implement my own hash map HashMap<class KEY, class VALUE>.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
template<class KEY, class VALUE>
class HashMap
{
        class Node
        {
                   KEY key;
                   VALUE val;
        };


	// bucket array; stores lists that contain key-val pairs
	list<Node> * buckets;
        // bucket length
	int nbucket;

	// hash key
	unsigned int hashCode(KEY k);
	// compression function
	unsigned int compFunct(unsigned int);

	public:
	HashMap();
	void insert(KEY, VALUE);
	void erase(KEY);
	virtual ~HashMap();
};


definition in the discussion:
hash code: maps key into some peculiar number
compression function: compresses the peculiar number into the range of the bucket array

I've got a few questions:
1, How do I choose nbucket? Textbook says it'd better be prime(a simple compression function given by the textbook is %nbucket). However, I don't understand why it should be a fixed number. As the hash map grows in size, ideally, the bucket array should increase accordingly to avoid more and more heavy collisions. Though, I can't seem to come up with a good way to implement this "dynamic" growth. Any idea?

2, hash code. This is a hard one, in my opinion. How can I come up with a good yet not too complicated version of harsh code? I'm thinking about using the address of variable key as the harsh code. Does that sound like a good idea? Why and why not?
Edit:
I've found it's not a good idea because I'll always have the same address of the parameter KEY key on stack of int hashCode(KEY);
Here's the update of the question:
How can I convert a variable of any type to an int or something? Essentially, I need to implement the Object.hashCode() in java.
I'm thinking about taking, say, the lowest 32 bits of KEY key, regardless of its type. But << and >> operators don't seem to work all the time. How can I do this?
Below is what I've got (with errors):
1
2
3
4
5
6
7
8
9
10
template<class KEY, class VALUE>
unsigned int HashMap<KEY, VALUE>::hashCode(KEY key)
{
	unsigned int k = key & 0xffffffff; //error: no match for ‘operator&’ in ‘key & 4294967295u’
	k += ~(k<<9);
	k ^= (k>>14);
	k += (k<<4);
	k ^= (k>>10);
	return k;
};


3, I had intended to derive my class HashMap from map in stl, but found map didn't have a virtual destructor, which makes it a bad idea to derive a class from it. I don't understand why those people did this. Is there any strong reason for them to prevent people from inheriting from map?


Many thanks in advance!
Last edited on Feb 4, 2012 at 10:09am
Feb 5, 2012 at 9:14am
Anyone?
Please...
Feb 5, 2012 at 10:35am
> I don't understand why it should be a fixed number. As the hash map grows in size, ideally,
> the bucket array should increase accordingly to avoid more and more heavy collisions.

You would have to trade that off with the overhead of creating new buckets to hold the existing elements, rehashing and copying them etc. Probably not worth it in the general case.


> I'm thinking about using the address of variable key as the harsh code.

This is a terrible, absurd idea; standard library containers use value semantics. Two keys with values that compare equal would hash to different values if you do this. Perhaps you should use std::hash<>, at least to start with. http://en.cppreference.com/w/cpp/utility/hash


> I had intended to derive my class HashMap from map in stl, but found map didn't have a
> virtual destructor, which makes it a bad idea to derive a class from it.

Not really; things would be fine as long as you do not try to use it in an object-oriented manner. Depends on your context, but perhaps you may want to consider using private inheritance.

> I don't understand why those people did this.
> Is there any strong reason for them to prevent people from inheriting from map?

One of the overriding design goals of the standard library is performance - ergo, compile-time polymorphism is preferred to run-time polymorphism. Calling a virtual function to get the size of a container or to increment an iterator is fine for Java; it typically isn't for C++. To repeat, there is nothing amiss in inheriting from a standard container - as long as the programming paradigm is generic, not object-oriented.




Feb 12, 2012 at 3:26pm
@JLBorges:

Thanks a lot for the response!


Perhaps you should use std::hash<>, at least to start with.


It sounds like a good idea for basic types.
However, I am looking for a generic and type-independent way for writing hashcode(), primarily because I'll need to hash a user-defined object into some (ideally)"unique" hash code.

I'm currently thinking about taking a few bits from the object and do some manipulation, as you may have observed in the following code piece:
1
2
3
4
5
6
7
8
9
10
template<class KEY, class VALUE>
unsigned int HashMap<KEY, VALUE>::hashCode(KEY key)
{
	unsigned int k = key & 0xffffffff; //error: no match for ‘operator&’ in ‘key & 4294967295u’
	k += ~(k<<9);
	k ^= (k>>14);
	k += (k<<4);
	k ^= (k>>10);
	return k;
};



Obviously, bitwise operators don't apply to user define types.

And here is the CORE of my question:
Given an object and its memory location, how can I take some bits from the memory location and re-interpret those selected bits as an int so that bitwise operators may apply?

Thanks a lot!!
Feb 12, 2012 at 6:43pm
> primarily because I'll need to hash a user-defined object into some (ideally)"unique" hash code.

You will also need to compare those objects for equality.

> And here is the CORE of my question:
> Given an object and its memory location, how can I take some bits from the memory location
> and re-interpret those selected bits as an int so that bitwise operators may apply?

By using a reinterpret_cast<> of course. Trying to write a hash function this way would be insane.

If you want the keys in your HashMap to be of any hashable and equality comparable type, you need to have at least another two template parameters (ideally with defaults). Look again at the interface of std::unordered_map<> http://en.cppreference.com/w/cpp/container/unordered_map
And spare a moment or two to ponder: why does it need more than two template parameters?
Last edited on Feb 12, 2012 at 6:43pm
Topic archived. No new replies allowed.