map<string, int> is faster than map<int, string>?

The following example times how long it takes to build a map<int, string> and a map<string, int> both of
500 000 elements, where the strings are generated randomly. What I don't get is the results after running this example. It is faster to build a map<string, int> rather that a map<int, string>. Please check the comments in the code to better understand my question.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <ctime>
#include <algorithm>
#include <limits>
#include <map>
#include <iostream>
#include <string>

using namespace std;

inline int rand_int(int max) { return rand() % max; }

inline int rand_int(int min, int max) { return rand_int(max - min) + min; }

string rand_string()
{
	const int iterations = rand_int(100);
	string result;
	result.reserve(iterations);
	for(int i = 0; i < result.size(); ++i)
	{
		result[i] = char(rand_int(int(numeric_limits<char>::min()),
				          int(numeric_limits<char>::max())));
	}
	return result;
}

int main()
{
	clock_t start_m1 = clock();
	map<int, string> m1;
	for(int i = 0; i < 500000; ++i)
	{
		m1[i] = rand_string();
		// map orders its elements according to the Key
		// The Key here is int
		// The for-loop generates sorted ints
		// hence the Keys are already sorted
	}
	clock_t stop_m1 = clock();
	cout << "The time it took to build map<int, string> is "
		<< double(stop_m1 - start_m1) / CLOCKS_PER_SEC << " seconds" << '\n';

	clock_t start_m2 = clock();
	map<string, int> m2;
	for(int i = 0; i < 500000; ++i)
	{
		m2[rand_string()] = i;
		// The Key here is string
		// Randoms strings are generated that are not sorted
		// So to build the map in a sorted order,
		// bool operator<(const string& _Left, const string& _Right)
		// has to be called a lot of times
		// to insert the elements in a sorted order
		// and comparing strings should take more time to compute
		// than comparing ints
		// So many calls to that function means that
		// building m2 should take a longer time than m1
		// also considering that the Keys of m1 are already sorted
	}
	clock_t stop_m2 = clock();
	cout << "The time it took to build map<string, int> is "
		<< double(stop_m2 - start_m2) / CLOCKS_PER_SEC << " seconds" << '\n';

	cout << "Please enter a character to exit:" << "\n";
	char ch = 0;
	cin >> ch;

	return 0;
}


Result:
1
2
The time it took to build map<int, string> is 23.974 seconds
The time it took to build map<string, int> is 8.928 seconds


Should it be faster to build map<int, string> than map<string, int>?

Thanks in advance to anyone who replies!
Last edited on
To my knowledge, std::map and std::set are implemented as self-balancing Binary Search Trees of some sort (sometimes Red-Black Trees).

This means that when you insert a new element, the tree may get rebalanced.
So maybe, because you insert the int keys "in order", you get a bad case of perpetual balancing?

I'm not sure what I said above is correct, so I too am interested in what others have to say.
Entering the int keys "in order" should make no difference to the insert operation, due to the input not actually knowing that it is "in order". However, even when I optimized it for being in order, though it reduced the time significantly, it still wasn't as fast as map<string, int>.

http://ideone.com/ayXyEK
I see a bug in the original code.

1
2
3
4
5
6
7
8
9
10
11
12
string rand_string()
{
	const int iterations = rand_int(100);
	string result;
	result.reserve(iterations); // string still has a length of 0, use std::string::resize()
	for(int i = 0; i < result.size(); ++i)
	{
		result[i] = char(rand_int(int(numeric_limits<char>::min()),
				          int(numeric_limits<char>::max())));
	}
	return result; // you return a string which is considered actually is empty
}


http://www.cplusplus.com/reference/string/string/reserve/
http://www.cplusplus.com/reference/string/string/resize/
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <string>
#include <random>
#include <algorithm>
#include <vector>
#include <set>
#include <ctime>
#include <map>
#include <iostream>

std::string random_string( std::mt19937& twister )
{
    static std::uniform_int_distribution<std::size_t> rand_size(10,100) ;
    static std::uniform_int_distribution<char> rand_char_distr( 1, 127 ) ;
    const auto rand_char = [&twister] { return rand_char_distr(twister) ; };

    std::string str( rand_size(twister), 0 ) ;
    std::generate( str.begin(), str.end(), rand_char ) ;
    return str ;
}

std::vector< std::pair<std::size_t,std::string> >
                      generate_data( std::mt19937& twister, std::size_t n )
{
    std::set< std::pair<std::size_t,std::string> > s ;
    while( s.size() < n ) s.emplace( twister(), random_string(twister) ) ;

    std::vector< std::pair<std::size_t,std::string> > v( s.begin(), s.end() ) ;
    std::shuffle( v.begin(), v.end(), twister ) ;
    return v ;
}

struct cpu_timer
{
    ~cpu_timer()
    {
        auto end = std::clock() ;
        std::cout << double( end - begin ) / CLOCKS_PER_SEC << " secs.\n" ;
    };

    const std::clock_t begin = std::clock() ;
};

int main()
{
    std::mt19937 rng( std::time(nullptr) ) ;
    constexpr std::size_t N = 1024 * 512 ;
    const auto test_data = generate_data( rng, N ) ;
    std::size_t cnt ;

    {
        std::cout << "map with key == int : " ;
        std::map< std::size_t, std::string > m ;
        cpu_timer t ;
        for( const auto& p : test_data ) m.emplace( p.first, p.second ) ;
        for( const auto& p : test_data ) m.emplace( p.first, p.second ) ; // duplicates

        cnt = 0 ;
        for( const auto& p : test_data ) if( m.find(p.first) != m.end() ) ++cnt ; // look up
    }
    std::cout << "cnt: " << cnt << '\n' ;

    {
        std::cout << "map with key == string: " ;
        std::map< std::string, std::size_t > m ;
        cpu_timer t ;
        for( const auto& p : test_data ) m.emplace( p.second, p.first ) ;
        for( const auto& p : test_data ) m.emplace( p.second, p.first ) ; // duplicates

        cnt = 0 ;
        for( const auto& p : test_data ) if( m.find(p.second) != m.end() ) ++cnt ; // look up
    }
    std::cout << "cnt: " << cnt << '\n' ;
}

http://coliru.stacked-crooked.com/a/9a3064676f2463c9
g++-4.8 -std=c++11 -O2 -pedantic-errors -Wall -Wextra -Werror main.cpp && ./a.out
map with key == int : 2.37 secs.
cnt: 524288
map with key == string: 5.37 secs.
cnt: 524288
Edit: Much too late.
Last edited on
@JLBorges and @Catfish666 and @NT3

Thank you so much, especially to JLBorges!
Topic archived. No new replies allowed.