Hash Map lookup

Forum

Forum
General C++ Programming
Hash Map lookup

Feb 18, 2011 at 6:24am

Hello All...
I have a map with signature <string, X> (X: typedef struct{double.. }X;)
This map is having 1600C2 (1279200) entries indexed on string.
size of map is: 30MB.
My query is will the lookup will take more time than the splitted ones. mean if i split the entries into 2 or 3 maps. but again splitting will again cause an overhead of selection of specific map.

Feb 18, 2011 at 2:18pm

moorecm (1932)

If the question is 1 map of some struct vs. some maps of single values, go with the map of struct because you would only have to look it up once. Lookup on a map is logarithmic.

Feb 18, 2011 at 2:29pm

jsmith (5804)

Yes, unless you have advance knowledge which map the element you're looking for is in.

Consider a binary tree with 32 elements. It will take O(log N)=6 comparisons to find the element.
Now split that into 4 binary trees, each with 8 elements. It will take O(log N)=3 comparisons to
find the element, or exactly log N =3 comparisons to not find the element in that map.

1/4 of the time you'll find it in the first map, O(log N)=3 comparisons.
1/4 of the time you'll find it in the second, meaning 3 comparisons for the first map, and O(log N) for the second.
1/4 of the time you'll find it in the third, meaning 6 comparisons for the first two maps, and O(log N) for the third.
1/4 of the time you'll find it in the fourth, meaning 9 comparisons for the first three maps, and O(log N) for the last.

Your expected number of comparisons to find an element, knowing no hints as to which map the element might be in, is:

1/4 * O(log N) + 1/4 ( 3 + O(log N) ) + 1/4( 6 + O(log N)) + 1/4( 9 + O(log N)) =
1/4 * ( 3 + 6 + 9 + 4 * O(log N) ) =
18/4 + O(log N) =
4.5 + O(log N) =
4.5 + 3 =
7.5

7.5 > 6.

Last edited on Feb 18, 2011 at 2:31pm

Feb 18, 2011 at 4:10pm

simeonz (490)

Suppose your queries are of the type - what are the values of field Fi in the structs that map from the key string S. Using several maps, specific to each field, will provide no improvement on the speed of those queries. The reason is that the cost of the hashing depends entirely on the type of the keys used, and the collisions are proportional to the density of the entries, which remain the same in all cases. The maintenance time will be obviously worsened. You will have to insert and remove in several maps instead of single one. Although I like to discuss the positive side to any option, I just can't see the upside here.

Regards

Feb 23, 2011 at 6:27am

satya (5)

Thanks for the reply..
well, i implemented the one map only because as jsmith said. it will not improve the time complexity, and i also thought the same...

Topic archived. No new replies allowed.