Comparison function for sorted container searching

Dear C++ and STL experts,
As the title suggests, I have a problem relating to searching a sorted vector using a comparison function. Code compiles OK, but I have a flaw in my logic somewhere for which I can't find a solution. I will outline the code first and pose my questions at the end...

Necessary declarations...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// person-book type
typedef struct {
  UINT32 personID;
  UINT32 bookCount;
} _pbk_t;

typedef vector<UINT32> vuint32;
typedef vector<_pbk_t> vpbk;

// the Compare Book Count function
class CompareBookCount
{
public:
  CompareBookCount(vector<_pbk_t> v) : _v(v) {}
  bool operator()(const UINT32& lhs, const UINT32& rhs) {
    return _v.at(lhs).bookCount < _v.at(rhs).bookCount;
  }
  bool operator()(const UINT32& p) {
    return _v.at(p).bookCount == _bkCount;
  }
  inline void SetBookCount(const UINT32& bkCount) {
    _bkCount = bkCount; 
  };
private:
  vector<_pbk_t> _v;
  UINT32 _bkCount;
};


Now, the implementation...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void somefunc() 
{
  const UINT32 COUNT = 8;
  vpbk vPerBk;
  vuint32 vPersons;
  vuint32::iterator _it_ui32;
  UINT32 t;

  vPerBk.reserve(COUNT);
  vPersons.reserve(COUNT);

  // generate some sample book counts
  UINT32 bkcount[] = { 4, 1, 3, 6, 3, 1, 7, 4 };
  _pbk_t stn2Msr;

  for (t=0; t<COUNT; ++t)
  {
    vPersons.push_back(t);
    stn2Msr.personID = t;
    stn2Msr.bookCount = bkcount[t];
    vPerBk.push_back(stn2Msr);
  }

  CompareBookCount compareBookCountFunc(vPerBk);
  sort(vPersons.begin(), vPersons.end(), compareBookCountFunc);  

  cout << "vPersons is sorted according to book count in vPerBk vector ..." << endl;
  
  for (t=0; t<COUNT; ++t)
    cout << "vPerBk " << setw(2) << left << t <<
      ": Person " << vPersons.at(t) <<
      " has " << vPerBk.at(vPersons.at(t)).bookCount << " books" << endl;

  const UINT32 bookCount(3);
  compareBookCountFunc.SetBookCount(bookCount);

  _it_ui32 = find_if(vPersons.begin(), vPersons.end(), compareBookCountFunc);

  cout << endl;
  cout << "find_if seach for person with a book count of " << bookCount << "... " << endl;
  if (_it_ui32 != vPersons.end())
    cout << "+ Person " << *_it_ui32 << " has " << vPerBk.at(*_it_ui32).bookCount << " book(s)." << endl;

  _it_ui32 = lower_bound(vPersons.begin(), vPersons.end(), bookCount, compareBookCountFunc);

  cout << endl;
  cout << "Bug: lower_bound seaches for person " << bookCount << ", not for the person with a book count of " << bookCount << "... " << endl;
  if (_it_ui32 != vPersons.end())
    cout << "+ Person " << *_it_ui32 << " has " << vPerBk.at(*_it_ui32).bookCount << " book(s)." << endl;
}


On run time, this produces:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
vPersons is sorted according to book count in vPerBk vector ...
vPerBk 0 : Person 1 has 1 books
vPerBk 1 : Person 5 has 1 books
vPerBk 2 : Person 2 has 3 books
vPerBk 3 : Person 4 has 3 books
vPerBk 4 : Person 0 has 4 books
vPerBk 5 : Person 7 has 4 books
vPerBk 6 : Person 3 has 6 books
vPerBk 7 : Person 6 has 7 books

find_if seach for person with a book count of 3... 
+ Person 2 has 3 book(s).

Bug: lower_bound seaches for person 3, not for the person with a book count of 3... 
+ Person 3 has 6 book(s).


So, as shown above, find_if finds for me exactly what I am searching for - the first person with 3 books. However, lower_bound retrieves person 3.

I understand why lower_bound is doing what it is, but I cannot figure out how to re-write my comparison function to yield the equivalent outcome of find_if.

I would be very grateful if someone would point me in the right direction.

Many thanks in advance, David.
Ok, the problem is that you are misunderstanding how to write comparison functions.

Here's a working example of std::sort. Look carefully to see how my implementation differs from yours. You are misunderstanding what parameters are getting passed to operator().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
struct Person {
    string firstName;
    string lastName;
};

struct AscendingSortByLastName {
    // Here is the function that will be called by std::sort:
    bool operator()( const Person& lhs, const Person& rhs ) {
        if( lhs.lastName < rhs.lastName ) return true;
        if( rhs.lastName < lhs.lastName ) return false;
        return lhs.firstName < rhs.firstName;
    }
};

// Assume we have a vector of Person and it has some elements in it.
vector< Person > people;

// To sort:
std::sort( people.begin(), people.end(), AscendingSortByLastName() );


Note that the comparator does not have state; it does not contain the vector
to sort. The comparison function is passed two elements from the container
you are sorting (in my case Person) and needs to return whether or not
the first is less than the second.

In your code, your comparison function on std::sort() is taking two UINT32
(because your vector is a vector of UINT32) and is treating these values
as indices into the vector. They are not indices; they are actual values
of two elements in the vector. Since operator< is already defined for UINT32,
you don't even need a comparison function for that std::sort() call.

You have similar problems on your find_if and lower_bound calls.
I think you might be misunderstanding what lower_bound is for. lower_bound uses operator< where find_if uses operator==. Now you are specifying a functor so it will use the functor. However, find_if calls the operator() taking a single int since it is comparing for equality. lower_bound uses operator< so it is using the functor that compares to uint32 values. Do you see the difference? The point is that find_if and lower_bound are really not meant to return the same thing.

Here is the problem that I found while debugging. First, I tweaked your operator() so that it would be easier to debug by using operator[] and expanding into if statements. The compiler stopped inlining it and I was able to set breakpoints and see what was happening.
1
2
3
4
5
6
7
8
9
10
bool operator()(const UINT32& lhs, const UINT32& rhs) { 
      if(_v[lhs].bookCount < _v[rhs].bookCount) 
      { 
	  return true; 
      } 
      else 
      { 
	  return false; 
      } 
  } 


You are specifying a book count of 3 for lower_bound but the functor doesn't take a book count it takes an index into the other vector stored in the functor. Therefore the right hand argument of the operator() will always be 3. Therefore, lower_bound is working as it was designed. It will compare the bookcount using the lhs to the bookcount for person _v[3] which happens to be person 3 which has 6 books. Therefore lower bound thinks that you are looking for the first person with at least 6 books. Step through the function in your debugger and you will see what I mean.

Sorry I don't have time to fiddle with a possible solution right now but that should help you a bit. You really have to step through the code to understand what lower_bound is trying to do.
kempofighter (and everyone),
thank you for your help. Your comment "the right hand argument ... will always be 3" has given me a clue which helped me solve my problem, as far as giving me the right answer is concerned.

To compare the desired "book count" value, I added two more operator() functions to which I pass the book count argument as a boost shared pointer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <boost/shared_ptr.hpp>
using namespace boost;

// the Compare Book Count function
class CompareBookCount
{
public:
  CompareBookCount(vector<_pbk_t> v) : _v(v) {}
  // used for lower_bound, upper_bound, etc...
  bool operator()(const shared_ptr<UINT32> lhs, const UINT32& rhs) {
    return (_v.at(rhs).bookCount < *(lhs.get()));
  }
  bool operator()(const UINT32& lhs, const shared_ptr<UINT32> rhs) {
    return (_v.at(lhs).bookCount < *(rhs.get()));
  }
  // used for sort(...)
  bool operator()(const UINT32& lhs, const UINT32& rhs) {
    return (_v.at(lhs).bookCount < _v.at(rhs).bookCount);
  }
  // used for find_if
  bool operator()(const UINT32& p) {
    return _v.at(p).bookCount == _bkCount;
  }
  inline void SetBookCount(const UINT32& bkCount) { _bkCount = bkCount; };
private:
  vector<_pbk_t> _v;
  UINT32 _bkCount;
};


On runtime, this leads to:
1
2
3
4

lower_bound seach for person with a book count of 3...
+ Person 2 has 3 book(s).


Voila! So it now works. Strictly speaking, it is not that shared_ptr was the answer, but that I am forcing the comparison function to know which argument is the book count.

Although I get the correct answers every time and am enjoying lower_bound speed, something tells me that this is "wrong", as far as good programming practice goes. Furthermore, VS2008 produces an "invalid operator<" ASSERT in Debug mode (in Release mode all is well). I have seen posts on this behaviour, but not an answer... so perhaps there is a better solution?

Any thoughts?

Thanks again, David.
My apologies - the first operator() method should be:

1
2
3
4
5
6
7
8
9
10
11
// the Compare Book Count function
class CompareBookCount
{
public:
  CompareBookCount(vector<_pbk_t> v) : _v(v) {}
    // used for lower_bound, upper_bound, etc...
    bool operator()(const shared_ptr<UINT32> lhs, const UINT32& rhs) {
      return (*(lhs.get()) < _v.at(rhs).bookCount);
    }
    ...
}


The call to lower_bound is therefore:

1
2
  shared_ptr<UINT32> spbkCount(new UINT32(bookCount));
  _it_ui32 = lower_bound(vPersons.begin(), vPersons.end(), spbkCount, compareBookCountFunc);


Note that this still produces an "invalid operator>" ASSERT in VS2008 during debug mode, although Eclipse CDT debugs without complaint.

Thanks in advance, David.
I have a couple of suggestions. First, you don't need to use the typedef keyword for declaring the struct. In C++ it is easier to just write it this way.

1
2
3
4
5
struct _pbk_t
{
   UINT32 personID;
   UINT32 bookCount;
};



Secondly, I would greatly simplify the functor and break it up into more intuitive pieces. If you want to compare book counts, it is very simple and it is equally simple to compare personId information. In fact this method is much simpler and allows you to sort the objects more directly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class CompareBookCount
{
public:
    CompareBookCount(UINT32 count = 0) : bkCount_(count) {}

    // use for sorting and lower bound
    bool operator() (const _pbk_t& lhs, const _pbk_t& rhs) { return lhs.bookCount < rhs.bookCount; }

    // use for finding by book count
    bool operator() (const _pbk_t& lhs) { return lhs.bookCount == bkCount_; }

    void setBookCount(UINT32 count) { bkCount_ = count; }

private:
    UINT32 bkCount_;
};

class ComparePersonId
{
public:
    bool operator() (const _pbk_t& lhs, const _pbk_t& rhs) { return lhs.personID < rhs.personID; }
};

as far as lower bound goes, I don't see how using shared_ptr really helps you much. I don't have time to debug that to determine why it works. Wouldn't it be simpler to call it like so? Create a temporary struct object and use the array of structures directly.

1
2
3
4
5
_pbk_t person = { 0, 3 }; // personId doesn't matter, set book count to 3.
std::vector<_pbk_t>::iterator pos = lower_bound(vPerBk.begin(), vPerBk.end(), person , CompareBookCount);

// to sort
std::sort(vPerBk.begin(), vPerBk.end(), CompareBookCount);


In summary, I'd simplify the whole program by eliminating the array of integers and simply operate directly on the array of person objects. That is my two cents. Have fun. By the way, I didn't compiler those snips. I may not have named all of the variables as you did but I am certain that the concept works if you incorporate something like that into your program. Also, lower_bound really expects a functor that takes two args of the same type. I'm not sure why that example you have written compiles.
Last edited on
Topic archived. No new replies allowed.