If I had understood correctly, hash tables are basically an array of linked lists. So currently I'm trying to write my own hash table and linked list template classes in order to understand how it exactly works, tough I may stick to unordered_map and list containers from the standard library. But aren't linked list an inefficient way of storing data? I mean they aren't indexed and I got to iterate through all previous elements to access the element I want. Wouldn't it be more efficient to use a dynamic vector instead of a list when doing the chaining?
When thinking about hash tables, you have to consider breadth (hash array size) verses depth (number of entries on any linked list). The idea is to make the hash table sufficiently large so that you get a small number of entries on any particular linked list, thereby minimizing the amount of time to search any linked list.
Consider a hash array size which is equal to or greater than the expected number of entries. If the hash algorithm is sufficiently randomized, you would expect an average of one entry per linked list. This is a very efficient retrieval. Compute the hash for the desired item, follow the forward pointer from the hash table to the item. The only entries that would have more than one item would be where two items resulted in the same hash value.
Vectors would be less efficient (in the general case) since vectors have to be reallocated and copied periodically as elements are added.
tough I may stick to unordered_map and list containers from the standard library.
Good idea!
But aren't linked list an inefficient way of storing data?
I mean they aren't indexed and I got to iterate through all previous elements to access the element I want.
Jup...
Accessing the Data is the Problem.
If you just want to dump a few things in there without ever looking in there Lists are okey....
If you want to iterate over them just stop and use a vector.
Wouldn't it be more efficient to use a dynamic vector instead of a list when doing the chaining?
Wouldn't it be more efficient to use a dynamic vector instead of a list when doing the chaining?
Adding/removing object from list is O(1). Adding to vector might be O(n), and deleting is generally O(n) too.
So not really.
Accessing the Data is the Problem.
If you just want to dump a few things in there without ever looking in there Lists are okey....
If you want to iterate over them just stop and use a vector.
It is locality of nodes themself is the problem. Main bottleneck with list is iteration: data prefetch is often cannot preload next node and you would suffer full cost of random memory access. Accessing object itself through iterator is fast, as required memory is probably already in cache.