map<string, int> is faster than map<int, string>?

Dec 25, 2013 at 12:14pm
The following example times how long it takes to build a map<int, string> and a map<string, int> both of
500 000 elements, where the strings are generated randomly. What I don't get is the results after running this example. It is faster to build a map<string, int> rather that a map<int, string>. Please check the comments in the code to better understand my question.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <ctime>
#include <algorithm>
#include <limits>
#include <map>
#include <iostream>
#include <string>

using namespace std;

inline int rand_int(int max) { return rand() % max; }

inline int rand_int(int min, int max) { return rand_int(max - min) + min; }

string rand_string()
{
	const int iterations = rand_int(100);
	string result;
	result.reserve(iterations);
	for(int i = 0; i < result.size(); ++i)
	{
		result[i] = char(rand_int(int(numeric_limits<char>::min()),
				          int(numeric_limits<char>::max())));
	}
	return result;
}

int main()
{
	clock_t start_m1 = clock();
	map<int, string> m1;
	for(int i = 0; i < 500000; ++i)
	{
		m1[i] = rand_string();
		// map orders its elements according to the Key
		// The Key here is int
		// The for-loop generates sorted ints
		// hence the Keys are already sorted
	}
	clock_t stop_m1 = clock();
	cout << "The time it took to build map<int, string> is "
		<< double(stop_m1 - start_m1) / CLOCKS_PER_SEC << " seconds" << '\n';

	clock_t start_m2 = clock();
	map<string, int> m2;
	for(int i = 0; i < 500000; ++i)
	{
		m2[rand_string()] = i;
		// The Key here is string
		// Randoms strings are generated that are not sorted
		// So to build the map in a sorted order,
		// bool operator<(const string& _Left, const string& _Right)
		// has to be called a lot of times
		// to insert the elements in a sorted order
		// and comparing strings should take more time to compute
		// than comparing ints
		// So many calls to that function means that
		// building m2 should take a longer time than m1
		// also considering that the Keys of m1 are already sorted
	}
	clock_t stop_m2 = clock();
	cout << "The time it took to build map<string, int> is "
		<< double(stop_m2 - start_m2) / CLOCKS_PER_SEC << " seconds" << '\n';

	cout << "Please enter a character to exit:" << "\n";
	char ch = 0;
	cin >> ch;

	return 0;
}


Result:
1
2
The time it took to build map<int, string> is 23.974 seconds
The time it took to build map<string, int> is 8.928 seconds


Should it be faster to build map<int, string> than map<string, int>?

Thanks in advance to anyone who replies!
Last edited on Dec 25, 2013 at 12:43pm
Dec 25, 2013 at 12:39pm
To my knowledge, std::map and std::set are implemented as self-balancing Binary Search Trees of some sort (sometimes Red-Black Trees).

This means that when you insert a new element, the tree may get rebalanced.
So maybe, because you insert the int keys "in order", you get a bad case of perpetual balancing?

I'm not sure what I said above is correct, so I too am interested in what others have to say.
Dec 25, 2013 at 12:46pm
Entering the int keys "in order" should make no difference to the insert operation, due to the input not actually knowing that it is "in order". However, even when I optimized it for being in order, though it reduced the time significantly, it still wasn't as fast as map<string, int>.

http://ideone.com/ayXyEK
Dec 25, 2013 at 2:07pm
I see a bug in the original code.

1
2
3
4
5
6
7
8
9
10
11
12
string rand_string()
{
	const int iterations = rand_int(100);
	string result;
	result.reserve(iterations); // string still has a length of 0, use std::string::resize()
	for(int i = 0; i < result.size(); ++i)
	{
		result[i] = char(rand_int(int(numeric_limits<char>::min()),
				          int(numeric_limits<char>::max())));
	}
	return result; // you return a string which is considered actually is empty
}


http://www.cplusplus.com/reference/string/string/reserve/
http://www.cplusplus.com/reference/string/string/resize/
Last edited on Dec 25, 2013 at 2:09pm
Dec 25, 2013 at 2:27pm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <string>
#include <random>
#include <algorithm>
#include <vector>
#include <set>
#include <ctime>
#include <map>
#include <iostream>

std::string random_string( std::mt19937& twister )
{
    static std::uniform_int_distribution<std::size_t> rand_size(10,100) ;
    static std::uniform_int_distribution<char> rand_char_distr( 1, 127 ) ;
    const auto rand_char = [&twister] { return rand_char_distr(twister) ; };

    std::string str( rand_size(twister), 0 ) ;
    std::generate( str.begin(), str.end(), rand_char ) ;
    return str ;
}

std::vector< std::pair<std::size_t,std::string> >
                      generate_data( std::mt19937& twister, std::size_t n )
{
    std::set< std::pair<std::size_t,std::string> > s ;
    while( s.size() < n ) s.emplace( twister(), random_string(twister) ) ;

    std::vector< std::pair<std::size_t,std::string> > v( s.begin(), s.end() ) ;
    std::shuffle( v.begin(), v.end(), twister ) ;
    return v ;
}

struct cpu_timer
{
    ~cpu_timer()
    {
        auto end = std::clock() ;
        std::cout << double( end - begin ) / CLOCKS_PER_SEC << " secs.\n" ;
    };

    const std::clock_t begin = std::clock() ;
};

int main()
{
    std::mt19937 rng( std::time(nullptr) ) ;
    constexpr std::size_t N = 1024 * 512 ;
    const auto test_data = generate_data( rng, N ) ;
    std::size_t cnt ;

    {
        std::cout << "map with key == int : " ;
        std::map< std::size_t, std::string > m ;
        cpu_timer t ;
        for( const auto& p : test_data ) m.emplace( p.first, p.second ) ;
        for( const auto& p : test_data ) m.emplace( p.first, p.second ) ; // duplicates

        cnt = 0 ;
        for( const auto& p : test_data ) if( m.find(p.first) != m.end() ) ++cnt ; // look up
    }
    std::cout << "cnt: " << cnt << '\n' ;

    {
        std::cout << "map with key == string: " ;
        std::map< std::string, std::size_t > m ;
        cpu_timer t ;
        for( const auto& p : test_data ) m.emplace( p.second, p.first ) ;
        for( const auto& p : test_data ) m.emplace( p.second, p.first ) ; // duplicates

        cnt = 0 ;
        for( const auto& p : test_data ) if( m.find(p.second) != m.end() ) ++cnt ; // look up
    }
    std::cout << "cnt: " << cnt << '\n' ;
}

http://coliru.stacked-crooked.com/a/9a3064676f2463c9
g++-4.8 -std=c++11 -O2 -pedantic-errors -Wall -Wextra -Werror main.cpp && ./a.out
map with key == int : 2.37 secs.
cnt: 524288
map with key == string: 5.37 secs.
cnt: 524288
Dec 25, 2013 at 3:16pm
Edit: Much too late.
Last edited on Dec 25, 2013 at 3:17pm
Dec 25, 2013 at 3:43pm
@JLBorges and @Catfish666 and @NT3

Thank you so much, especially to JLBorges!
Topic archived. No new replies allowed.