unordered_set hash specialization

closed account (oGhfSL3A)
Hello!

I am specializing the hash function of std::unordered_set template class, but I want to use its size() as part of the function. How could I do this?

This code illustrates the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
namespace std{
    template<>
    class hash<item>{
    public:
        size_t operator ()(const item& i) const{
            return i.getId()%50;	//I want to use items.size() rather than 50
        }
    };
}

int main(){
    std::unordered_set<item> items;
    return 0;
}


Thanks!
Why do you want to do that? The hash should always stay the same for the same element. If the hash changes when the unordered_set changes you would not be able to find the already inserted items again.
closed account (oGhfSL3A)
Because I want multiple instances of unordered_set<item>, each one with a constant size that might be different.
closed account (oGhfSL3A)
Oh wait I think this will work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
namespace std{
    template<>
    class hash<item>{
    protected:
        unsigned size_;
    public:
        size_t operator ()(const item& c) const{
            return c.getId()%size_;
        }
    };
}

class specHash: public std::hash<item>{
public:
    specHash(){
        size_=50;
    }
};

int main(){
    std::unordered_set<item, specHash> cont;
    return 0;
}
Still don't understand why you want to use the size when computing the hash. I don't think using % will improve the hash, quite the opposite. Why not make use of std::hash to create a hash of the ID, that will probably give you a good hash.

1
2
3
4
5
6
7
8
9
namespace std{
    template<>
    class hash<item>{
    public:
        size_t operator ()(const item& i) const{
            return std::hash<decltype(i.getId())>()(i.getId());
        }
    };
}

Last edited on
closed account (oGhfSL3A)
The reason I am avoiding std::hash is because my item IDs are already created sequentially, but with large gaps between them like:

4, 5, 6, 7, 32, 33, 34, 35, 36, 49, 50, 51, 52 etc.

I am assuming std::hash would just keep them in the same order and waste memory with the gaps. I might be wrong?
closed account (oGhfSL3A)
On a related note, would it be in any way possible to look up an item based on the ID? Like cont.find(12) for the item with an ID of 12.
I can overload the hash function to take an unsigned int:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
namespace std{
    template<>
    class hash<gComponent>{
    protected:
        const unsigned size_;
    public:
        hash(unsigned size): size_(size){}
        size_t operator ()(const gComponent& c)const{
            return c.getObjectID()%size_;
        }
        size_t operator ()(const unsigned& i)const{
            return i%size_;
        }
    };
}


but the find method wouldn't use it as it only takes key type as a parameter.

Would it be best to just use an std::unordered_map?
Last edited on
I don't think the hash affects how much memory is used but it can have a big impact on performance.

unordered_set and unordered_map allocates a number of buckets. The number of buckets is usually nothing you need to worry about. The bucket array will be resized automatically as you add more elements, to try and have a good balance between memory usage and lookup speed.

The hash is used to decide in what bucket the element should be stored. I guess it uses something as simple as
 
hashValue % numberOfBuckets
to decide which bucket to store the element.

Finding a bucket by index is very fast. Finding an element in a bucket is slow if there are many elements in the bucket. With a good hash function the number of elements in each bucket is likely to be close to 1 so it will usually be very fast. If the hash function is bad you might end up with a lot of elements in the same bucket and that will be slow to search through.

So the reason I think you should not be doing %size in your hash function is:
1. It is unnecessary because the unordered_set/map is already doing something similar.
2. If you happened to increase the size of the unordered_set/map beyond the 50 that you used in the hash you would limit the number of buckets that will be used, leading to worse performance.

I'm no expert in designing good hash functions so I have no idea if using the ID number as-is is good or not (I guess it's not that bad). I mentioned using std::hash because I thought it was a safe bet, because it has been implemented to work well with any distribution of values. To know for sure you should test the performance using different hashes and see which one perform best.
would it be in any way possible to look up an item based on the ID? Like cont.find(12) for the item with an ID of 12.
Would it be best to just use an std::unordered_map?

Yes, use a map.
closed account (oGhfSL3A)
That is helpful, Thanks! I will use std::hash.
Topic archived. No new replies allowed.