std::set behaviour with comparison function

As per my info, std::set is implemented as a self-balancing binary search tree.

I m trying to analyze the behaviour of comparison function i.e. how it is being called, since that typically governs the efficiency of the program.

Please visit the following program and my queries underneath it:
--------------------
#include <set>
#include <iostream>
#include <stdio.h>

class Message
{
public:
int m_seq;
Message(int num):m_seq(num){};
~Message(){};
};

struct Message_cmp
{
bool operator()(const Message & s1, const Message & s2) const
{
printf("Message_cmp %s cmp func called...s1:%d s2:%d\n", __FUNCTION__, s1.m_seq, s2.m_seq);
return (s1.m_seq-s2.m_seq)>0;
}
};

std::set <Message, Message_cmp> set_db;

int main(void)
{
Message msg1(1), msg2(2), msg3(3);
set_db.insert(msg1);
printf("inserted %d\n", __LINE__);
set_db.insert(msg2);
printf("inserted %d\n", __LINE__);
set_db.insert(msg3);
printf("inserted %d\n", __LINE__);
return 0;
}

-----------------------------------------------------------------------
Output:
inserted 28
Message_cmp operator() cmp func called...s1:2 s2:1
Message_cmp operator() cmp func called...s1:2 s2:1
inserted 30
Message_cmp operator() cmp func called...s1:3 s2:1
Message_cmp operator() cmp func called...s1:3 s2:2
Message_cmp operator() cmp func called...s1:3 s2:2
inserted 32
-----------------------------------------------------------------------
Q1: During the second call to the insert function, why does it result in the comparison function being called twice..... similarly why is the comparison function called thrice in third insertion above? Assuming the std::set is implemented as a self-balancing binary search tree, the second and third insertion should lead to the comparison function being called only once!
I've changed you code. It's more readable for me.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <set>
#include <iostream>
#include <stdio.h>

class Message
{
public:
    typedef std::set <Message> MessageDB;
    Message(int num):m_seq(num){};
    ~Message(){};
    bool operator<(const Message & rhs) const
    {
        printf("Message_cmp %s cmp func called...this:%d new:%d\n", __FUNCTION__, m_seq, rhs.m_seq);
        return (m_seq-rhs.m_seq)>0;
    }
private:
    int m_seq;
};

int main(void)
{
    Message::MessageDB set_db;

    Message msg1(2), msg2(3), msg3(4), msg4(1);
    set_db.insert(msg1);
    printf("inserted %d\n", __LINE__);
    set_db.insert(msg2);
    printf("inserted %d\n", __LINE__);
    set_db.insert(msg3);
    printf("inserted %d\n", __LINE__);
    set_db.insert(msg4);
    printf("inserted %d\n", __LINE__);
    return 0;
}


About you quastion. std::set can keep only one uniq value and as I thisk stl library has template operator==, and it may be
1
2
3
4
5
6
7
8
9
template <typename T>
bool operator==(const T& val1, const T& val2)
{
    if(val1 < val2)
          return false;
    if(val2 < val1)
          return false;
     return true;
}


And as you see operator< is called twice.
Last edited on
@Denis: But that obviously isn't happening, because then the output would show s1 and s2 being
reversed, and it doesn't.

@nyrahul: Here is the implementation of std::set::insert( const value_type& ):

1
2
3
4
5
6
7
      std::pair<iterator,bool>
      insert(const value_type& __x)
      {
	std::pair<typename _Rep_type::iterator, bool> __p =
	  _M_t.insert_unique(__x);
	return std::pair<iterator, bool>(__p.first, __p.second);
      }


And _M_t is an Rb_tree, so here is the implementation of insert_unique:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  template<typename _Key, typename _Val, typename _KeyOfValue,
           typename _Compare, typename _Alloc>
    pair<typename _Rb_tree<_Key, _Val, _KeyOfValue,
			   _Compare, _Alloc>::iterator, bool>
    _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
    insert_unique(const _Val& __v)
    {
      _Link_type __x = _M_begin();
      _Link_type __y = _M_end();
      bool __comp = true;
      while (__x != 0)
	{
	  __y = __x;
	  __comp = _M_impl._M_key_compare(_KeyOfValue()(__v), _S_key(__x));
	  __x = __comp ? _S_left(__x) : _S_right(__x);
	}
      iterator __j = iterator(__y);
      if (__comp)
	if (__j == begin())
	  return pair<iterator,bool>(_M_insert(__x, __y, __v), true);
	else
	  --__j;
      if (_M_impl._M_key_compare(_S_key(__j._M_node), _KeyOfValue()(__v)))
	return pair<iterator, bool>(_M_insert(__x, __y, __v), true);
      return pair<iterator, bool>(__j, false);
    }


You can see key compares happening inside the while loop, and then it appears that
a second one will occur on the final if() in the case where you are not inserting at begin().
Appreciate your responses.

@Denis: The updated code is definately more readable. I ll keep that in mind during my next posts.

@jsmith: That was what i was looking for. And that explains it all.

I assumed the set implementation was an AVL tree based but the above snippet shows it is based on RB tree. Both the tree implementation are self balancing.

Also, I have to admit my explanation/problem stmt was not up to the point. I provided insufficient input data which couldn't properly explain my thoughts. The point that was intriguing me was the way in which every insertion in a set leads to traversal of the nodes in the tree. After adding more input to the set and checking the RB tree insertion logic, it all makes proper sense to me.
There is a difference in which nodes are balanced in RB tree vs AVL tree and also the worst case heights of two trees for the same input can be different and AVL trees provide a better worst case height to the tree but the balancing in AVL trees is more intensive than RB trees. Since the balancing technique is different in RB tree, I was confused in the way the nodes were traversed in the STL set during insertion.

Once again thank you guys for the response.
Topic archived. No new replies allowed.