search based on two keys

Pages: 12
May 26, 2008 at 11:47pm
Hi,
I have a data file in the following format. Date1 and Date2 ate Date fields in string format. Value1 and Value2 are integers.

I want to read the file into some data structure in memory so that I can do some searches based on Date1 and Date2 fields at the same time.

For example, I want to search and find, for each entry in Date1, entries where Date2 are the same. (I've separated the data file using a blank row below to show the entries I'm interested in. (I don't have any blank rows in the original data file.

Can someone tell me how I can do that ? Any code would be really appreciated. If you can show me how to do that using stl containers and algorithms, it would be really great, but that's not a absolute requirement.

Thank you very much.

John



Date1 Date2 Value1 Value2
----- ----- ------ -----
d1 e1 1 1
d1 e1 1 2

d1 e2 2 3
d1 e2 3 1

d2 e2 2 1

d2 e3 1 2
May 26, 2008 at 11:54pm
I would use a map of vectors. The key for your map would be Date1 field. Stored against that would be any records with that Date1. Then you could iterate through the vector to query them all.

That should be sufficient enough advice for you to complete the task.
May 27, 2008 at 12:09am
Thanks much for your quick reply.

I see what you're saying. I have a
Struct {
Date2;
Value1;
Value2;
}
and have a map<Date, Struct>. But, first, I have duplicate Date1 entries. So, I will have to use a multimap. Now, I can query the multimap and get all the Date1 entries with d1 say. Now, I have to again do another search to group e1's together, and e2's together for d1 and so on. It is this second search i am having touble with.

Thanks much again for your help.

John
May 27, 2008 at 12:12am
I was thinking more of map<Date1, vector<Struct> > . This organises your vectors based on Date1. So for any given Date1 you can have >= 0 Records in the Vector.

Ok. Here is an example I have.

1
2
3
4
5
map<string, vector<double> > mvProportionMatrix;

void CProportionsObservation::addProportion(string Group, double Proportion) {
  mvProportionMatrix[Group].push_back(Proportion);
}
Last edited on May 27, 2008 at 12:17am
May 27, 2008 at 12:18am
thanks for the clarification. if I do that, how can I do a second search to get the record with same Date2. I want the entries with Date2 the same for each Date1.

Thanks again for your help. ; )

John
May 27, 2008 at 12:22am
So you want them together based on Date1, then Date2 then the information?

map<string, map<string, vector<Struct> > > mvMatrix;
It starts getting a bit nasty at this point, especially when you have to iterate back through it. You could always put a wrapper class around it.

To iterate through the normal map of vectors here is my code
1
2
3
4
5
6
7
8
9
10
map<string, vector<double> >::iterator vPropPtr = mvProportionMatrix.begin();
// Through Each Map Object
 while (vPropPtr != mvProportionMatrix.end()) {
   // Through Each Object in Vector
   vector<double>::iterator vPtr2 = ((*vPropPtr).second).begin();
   while (vPtr2 != ((*vPropPtr).second).end()) {
     vPtr2++;
   }
   vPropPtr++;
 }


Note: I will swap this over to use the BOOST_FOREACH construct at some point in the future as the code is cleaner.
Last edited on May 27, 2008 at 12:23am
May 27, 2008 at 12:37am
yes, I want to "process" the information of a group. Grouping is to achive based on Date1 and Date2.

I'm still a bit unclear of the use of map inside the map. Since Date2 can be duplicates, shouldn't I use a multimap ?

Also, is there a way to get all the keys (or unique keys) in a multimap/map ?

Really appreciate your help ; )
May 27, 2008 at 12:42am
Hmmmm have not seen a multimap being used before. I don't like the idea of the "pair" concept they have. Personally I would avoid it.

The idea behind a map is you can store 1 Value against any given Key. For you the Key would be Date1 and the Value would be another Map (Map2). Map2 would store a Key Date2 and a Vector of Struct.

So To access any set of Value1, Value2 you would need to use
myMap[Date1][Date2];
vs (Multi-Map)
m.insert(pair<const char* const, int>("b", 6));

I guess it's like having a multi-dimensional map with a vector as the object being stored. It just seems cleaner to me than using a multi-map and having to worry about Pairs.

To get all of the unique keys from a map of map you would iterate through the first map, iterate through each map (Map2) within this and get all of the keys from each.

Does this make sense?
Last edited on May 27, 2008 at 12:44am
May 27, 2008 at 12:55am
Ok. Here is some code I have quickly put up. This will compile as-is under GCC/MingW 3.4.5+

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <map>
#include <vector>
#include <string>

using namespace std;

struct Data {
  int Value1;
  int Value2;
};

map<string, map<string, vector<Data> > > myMap;

void addData(string Date1, string Date2, int V1, int V2) {
  Data myData;
  myData.Value1 = V1;
  myData.Value2 = V2;
  myMap[Date1][Date2].push_back(myData);
}


int main() {
  int iTotal = 0;

  addData("Blah1", "Blah2", 1, 1);
  addData("Blah1", "Blah2", 1, 1);
  addData("Blah2", "Blah1", 1, 1);
  addData("Blah2", "Blah3", 1, 1);

  cout << "There Are " << myMap.size() << " different Date1's" << endl;

  // Loop Through Each Date1, Count How many Date2's Each Date1 Has.
  map<string, map<string, vector<Data> > >::iterator vPtr = myMap.begin();
  while (vPtr != myMap.end()) {
    cout << "Date1: " << (*vPtr).first << " has " << myMap[(*vPtr).first].size() << " different Date2s" << endl;
    iTotal += myMap[(*vPtr).first].size();
    vPtr++;
  }
  cout << "There are a total of " << iTotal << " dDate2's across your Date1's" << endl;

  cout << "Printing Pairs: " << endl;
  // Loop Through And List of All Date2s
  vPtr = myMap.begin();
  while (vPtr != myMap.end()) {
    // Create Date2 Iterator
    map<string, vector<Data> >::iterator v2Ptr = myMap[(*vPtr).first].begin();
    while (v2Ptr != myMap[(*vPtr).first].end()) {
      cout << "Pair: " << (*vPtr).first << " : " << (*v2Ptr).first << endl;
      v2Ptr++;
    }
    vPtr++;
  }

  return 0;
}


The Result is:
1
2
3
4
5
6
7
8
There Are 2 different Date1's
Date1: Blah1 has 1 different Date2s
Date1: Blah2 has 2 different Date2s
There are a total of 3 dDate2's across your Date1's
Printing Pairs: 
Pair: Blah1 : Blah2
Pair: Blah2 : Blah1
Pair: Blah2 : Blah3 


Is this what you are after? You can see why I'd use a Wrapper class around it to tidy the code up a little. But it's nothing major.
Last edited on May 27, 2008 at 12:55am
May 27, 2008 at 1:01am
Yes, being able to access myMap[Date1][Date2] would be neat.

Let me try to code it as you suggested, thanks much for your help.

John
May 27, 2008 at 1:09am
no worries. Feel free to use the sample I have provided as a base :)
May 27, 2008 at 1:17am
There are simple STL algorithms that can handle the
map< string, vector<string> >
and
map< string, map< string, vector<Struct> > >
easily.

FIRST STRUCTURE map< string, vector<string> >

You'll first need a functor to find date2:
1
2
3
4
5
6
7
struct date_equals
  {
  bool operator () (string date, const Struct& struc)
    {
    result = (date == struc.date);
    }
  };


If you want to know how many elements of date1 match date2.
1
2
3
4
5
long num_matches = count(
  m[ date1 ].second.begin(),
  m[ date1 ].second.end(),
  bind1st( date_equals(), date2 )
  );


Copy elements that match date1 and date2 into a new vector.
1
2
3
4
5
6
vector<Struct> date1and2data( num_matches );
copy(
  m[ date1 ].second.begin(),
  m[ date1 ].second.begin() +num_matches,
  date1and2data.begin()
  );


Alas, there's no 'copy_if()' STL algorithm (an oversight), but you can create your own easily. Let's make one that knows how to push_back():
1
2
3
4
5
6
7
8
9
10
11
template <class Container, class UnaryPredicate>
void push_back_if(
  Container&                   container,
  typename Container::iterator begin,
  typename Container::iterator end,
  UnaryPredicate               pred
  ) {
  for (; begin != end; begin++)
    if (pred( *begin ))
      container.push_back( *begin );
  }

Now you can directly copy elements matching date1 and date2 by appending them to a new vector.
1
2
3
4
5
6
7
vector<Struct> date1and2data;
push_back_if(
  date1and2data,
  m[ date1 ].second.begin(),
  m[ date1 ].second.end(),
  bind1st( date_equals(), date2 )
  );


If you want to do something to a range of dates, you can use transform()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
struct print_stuff
  {
  string date1, date2;
  ostream& outs;
  explicit print_stuff( string date1, string date2, ostream& outs ): date1(date1), date2(date2), outs(outs) { }
  Struct operator () (const Struct& struc)
    {
    outs << date1 << ", " << date2 << ", " << struc.value1 << ", " << struc.value2 << endl;
    return struc;  // don't actually change anything
    };
  };
...
transform(
  date1and2data.begin(),
  date1and2data.end(),
  print_stuff( date1, date2, cout )
  );


Or you can, again, go directly for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct print_stuff
  {
  string date1, date2;
  ostream& outs;
  explicit print_stuff( string date1, string date2, ostream& outs ): date1(date1), date2(date2), outs(outs) { }
  Struct operator () (const Struct& struc)
    {
    if (struc.date == date2)
      outs << date1 << ", " << date2 << ", " << struc.value1 << ", " << struc.value2 << endl;
    return struc;
    }
  };
...
transform(
  m[ date1 ].second.begin(),
  m[ date1 ].second.end(),
  print_stuff( date1, date2, cout )
  );



SECOND STRUCTURE map< string, map< string, vector<Struct> > >

It isn't necessarily as intimidating as it appears. In fact, in some respects its easier. Now for the same stuff as above.

To count the number of matching dates
 
size_type num_matches = m[ date1 ].second[ date2 ].second.size();

Woah! Nice!

Copy matching elements
Oh, wait... You've already got a list of matching elements! :-)
 
m[ date1 ].second[ date2 ].second


To do something to a range of elements, transform() still fits the bill.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct print_stuff
  {
  string date1, date2;
  stream& outs;
  explicit print_stuff( string date1, string date2, stream& outs ): date1(date1), date2(date2), outs(outs) { }
  Struct operator () (const Struct& struc)
    {
    outs << date1 << ", " << date2 << ", " << struc.value1 << ", " << struc.value2 << endl;
    return struc;
    }
  };
transform(
  m[ date1 ].second[ date2 ].second.begin(),
  m[ date1 ].second[ date2 ].second.end(),
  print_stuff( date1, date2, cout )
  );

Amazing.

BTW. Hopefully you noticed that the first and third 'print_stuff' functors were exactly the same, and the second differed by only one line. (Of course, the type of 'Struct' was different between the first two and the last.)

Hope this helps.

[edit] Yoinks! I'm living in molasses land....
Last edited on May 27, 2008 at 1:18am
May 27, 2008 at 1:20am
Ahhh a very nice addition Duoas. I kinda figured you'd chime in shortly :)

Couldn't
size_type num_matches = m[ date1 ].second[ date2 ].second.size();
be:
size_type num_matches = m[ date1 ][ date2 ].size();
Last edited on May 27, 2008 at 1:33am
May 27, 2008 at 1:40am
I'm sure it can. I've never actually played much with maps... (so I forget stuff) :-P
Last edited on May 27, 2008 at 1:40am
May 27, 2008 at 1:05pm
Thanks much guys. I'm trying to digest all this. Really appreciate your help.

John
May 27, 2008 at 6:20pm
There are few more options though, if you like.

I would write a binary searh tree or a sorted linked list.
As I read the file and store in tree or list, I would check if it is already there or there is a match. Would be simple I guess.

If I know better details of your task and it is simple, I would write a simple sorted linked list or binary search tree.

Check it out. Good luck :)
May 27, 2008 at 7:19pm
) Sorted Linked List
Remember he wants them indexed by Date1 first, then by Date2 based on Date1, and finally needs a collection of records off each Date1/Date2 combination. The linked list is a very linear storage mechanism that wouldn't provide easy/clean methods to access the collections of pairs. There is no easy mechanism for grabbing collections without continually iterating through the list.

) Binary Search Tree (I will address a Tree, as the Binary Tree only allows for 2 nodes, this wouldn't be overly helpful here, as he isn't ordering Date1/Date2 combinations).
A tree would be a good way to accomplish this. The first 2 layers of Nodes would be your Date1 and Date2. Level 3 would contain the collection of structs. The problem with this is Date1, Date2 are strings while the collections are Structs. While it'd be possible to develop this, you'd have to write the container yourself. Alot of un-needed code when a map/vector combination is more than capable of achieving the results. You'd end up using STL containers within your tree anyways.
Last edited on May 27, 2008 at 7:20pm
May 27, 2008 at 11:33pm
Hi Everyone, I just wanted to add .... I'm making good progress on this thanks to everyone. Many thanks to all. ; )

I just started picking up stl. Stl looks a bit complex to me right now but I'm really glad I found this forum. You guys gave me a lot of confidence to carry on.

Thank you very much, everyone!

John
May 28, 2008 at 12:10am
hah. I must say this would be a fairly harsh introduction to the STL. Most people get worried about using a vector for the first time. You have to do a map, of maps of vectors :P

You will quickly find the STL to be extremely useful, and a valuable skill to have. Good luck with your project

Z.
May 28, 2008 at 12:28am
Thanks ; )

Yes, I'm kind of determined to learn stl. I'm using all these small projects as a learning experience.

Thanks again.

John
Pages: 12