Mapping/indexing a vector<string>

Jan 9, 2011 at 6:41pm
Hoping someone can point me in the right direction...
I have a vector<string> that contains a wordlist. (for obvious reasons, im calling it "wordlist"

Im trying to optimize my program so that i dont have to run through the entire vector to get to the right word.

For example, I need to start at 'E' so any words before or after E would be a waste of time/calculation.

I know im looking for some kind of mapping or indexing where I can just skip to the place in the vector that contains words beginning with E but im not entirely sure what to look for or what the best option would be.

Anyone have any ideas?

(reading through it might be a bit vague...)
I cant just search for a particular word...i am taking all words starting with a particular letter and trying to match them to a random set of letters on a game board.
Last edited on Jan 9, 2011 at 6:59pm
Jan 9, 2011 at 7:27pm
see std::map lower_bound and upper_bound functions
http://www.cplusplus.com/reference/stl/map/
Jan 9, 2011 at 9:24pm
Thanks for the pointer.
I went with a multimap and used equal range. works like a champ!

to load my wordlist, i went the route of"
1
2
3
4
5
while (!words.eof())
	{
		getline (words, word);
		wordlist2.insert(pair<char, string>(word[0], word));
	}


i know vectors could be loaded like this (and i can confirm it works since this is how i had it prior to switching to a map:
 
vector<string> wordlist = vector<string> ( istream_iterator<string> (words), istream_iterator<string>() );

I am trying to do something equally as elegant to load up my map, but its not really coming together...I wish I could debug it but the error messages are pretty huge and use all sorts of short hand that make debugging it a real pain.
im doing this, but having trouble figuring out the syntax to make it work properly:
 
wordlist2.insert(pair<char, string>( ((istream_iterator<string> (words)->c_str())[0]), istream_iterator<string> (words) );

basically taking the first letter of the string as the key and the string as the value.

and ideas on what I am messing up?
Jan 9, 2011 at 11:19pm
It seems to me you would be happier using a std::set <string>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <string>
using namespace std;

struct wordlist_t: set <string> { };

int main()
  {
  wordlist_t wordlist;

  ifstream f( "a.cpp" );
  wordlist.insert( istream_iterator <string> ( f ), istream_iterator <string> () );
  f.close();

  cout << "There are " << wordlist.size() << " words in the list. They are:\n";
  for (wordlist_t::const_iterator
       word  = wordlist.begin();
       word != wordlist.end();
     ++word)
    cout << *word << endl;

  return 0;
  }

Hope this helps.
Jan 9, 2011 at 11:39pm
I thought about using a set, but I need to be able to get a subset wordlist based on a letter.
When i think about a set i see a list with unique values, a multiset would have values that are not unique. So if i tried to get a subset by using equal_range, i would have to use an actual word rather than the first letter.

so a set = aide, ail, aim, etc...
where a map would be map = a:aide, a:ail, a:aim, b:bucket, etc..
I might be missing something as im not too familiar with sets/maps/ etc.. but i dont think I'd be able to achieve that functionality with a set.

is there a way to do a wordlist.insert(pair<x,y> ( istream......[first letter], istream....(word) ) ?
Last edited on Jan 10, 2011 at 2:23am
Jan 10, 2011 at 11:25am
Oops! I didn't read carefully enough. :-(
Last edited on Jan 10, 2011 at 11:25am
Jan 10, 2011 at 1:05pm
no wait!
Duoas is right. You need a set here.
If you have a set of "banana", "cat", "cow", "dog", "moose", then lower_bound("c") will return an iterator to "cat" and lower_bound("d") will return an iterator to "dog". All the elements starting with 'c' are in range ["cat"; "dog")
Jan 10, 2011 at 2:57pm
I tried it out, and at least for me, its not working.
loaded the wordlist and did the following...
1
2
pair<set<string>::iterator, set<string>::iterator> word;
word = wordList.equal_range(board_letter);


I get the following error... feel free to take a crack at decoding the error message :)
from what I gather, equal range doesnt like that i am using a char instead of a string and of course, it wont let me cast it.

error C2664: 'std::pair<_Ty1,_Ty2> std::_Tree<_Traits>::equal_range(const std::basic_string<_Elem,std::char_traits<char>,_Ax> &)' : cannot convert parameter 1 from 'char' to 'const std::basic_string<_Elem,_Traits,_Ax> &'
1> with
1> [
1> _Ty1=std::_Tree<std::_Tset_traits<std::string,std::less<std::string>,std::allocator<std::string>,false>>::iterator,
1> _Ty2=std::_Tree<std::_Tset_traits<std::string,std::less<std::string>,std::allocator<std::string>,false>>::iterator,
1> _Traits=std::_Tset_traits<std::string,std::less<std::string>,std::allocator<std::string>,false>,
1> _Elem=char,
1> _Ax=std::allocator<char>
1> ]
1> and
1> [
1> _Elem=char,
1> _Traits=std::char_traits<char>,
1> _Ax=std::allocator<char>
1> ]
1> Reason: cannot convert from 'char' to 'const std::basic_string<_Elem,_Traits,_Ax>'
1> with
1> [
1> _Elem=char,
1> _Traits=std::char_traits<char>,
1> _Ax=std::allocator<char>
1> ]
1> No constructor could take the source type, or constructor overload resolution was ambiguous

I might just take the hit to memory and stay with my multimap instead of recoding everything to make a set work.

Now back to my question... does anyone know the proper syntax to do:
wordlist.insert( istream_iterator <string> ( f ), istream_iterator <string> () );

but in a multimap:
wordlist.insert(pair<x,y> ( istream......[first letter], istream....(word) ) ?
Last edited on Jan 10, 2011 at 3:04pm
Jan 10, 2011 at 3:17pm
Yes, you can only use a set (or any other container) of strings with strings. It's easy to convert char to string.

Here's a code example which should work (I think.. I didn't compile it):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <set>
#include <string>

string to_str(char c){
   return string(1, c);
}

int main(){
   std::set<std::string> worslist;
   wordlist.insert("banana");
   wordlist.insert("cat");
   wordlist.insert("cow");
   wordlist.insert("dog");
   wordlist.insert("moose");

   std::set<std::string>::iterator first, last;
   char ch = 'c';
   first = wordlist.lower_bound(to_str(ch));
   last = wordlist.lower_bound(to_str(ch+1));

   for(std::set<std::string>::iterator i = first; i < last; i++) std::cout << *i << std::endl;

   std::cin.get();
   return 0;
}


As for your last question, I have no idea what you're talking about.
Jan 10, 2011 at 3:38pm
You would be correct sir, it has the same functionality. I tested it out just now and it looks to be doing the same thing. Thanks!!
Also, this:
If x does not match any key in the container, the range returned has a length of zero, with both iterators pointing to the nearest value greater than x, if any, or to set::end if x is greater than all the elements in the container.

As for my last question...
When using a set to load a word list you would do this:
 
wordList.insert(istream_iterator<string> (words), istream_iterator<string> ());


now say I wanted to do the same for a multimap where the key is the first letter of the word and the value is the word. Previously I was trying to do this:
 
wordList.insert(pair<char, string>( ((istream_iterator<string> (words)->c_str())[0]), istream_iterator<string> (words) );


but that really wasnt working. so I was asking what would be the proper syntax to load a wordlist into a multimap via an istream_iterator ?
Jan 10, 2011 at 4:04pm
basically taking the first letter of the string as the key and the string as the value.
Try to implement a trie http://en.wikipedia.org/wiki/Trie (maybe too general)

what would be the proper syntax to load a wordlist into a multimap via an istream_iterator ?
Why don't just use a loop?
Last edited on Jan 10, 2011 at 4:04pm
Jan 10, 2011 at 4:14pm
I looked at trie's as well, its memory handling is very compact, but the setup time knocked it out of possible solutions.

I was actually using a loop, but i wanted to handle it more elegantly.
1
2
3
while (getline(words, word) {
		wordlist.insert(make_pair(word[0], word));
	}
Jan 10, 2011 at 11:07pm
A little functor will do it also.

1
2
3
4
5
6
7
8
struct mmt: std::pair <char, string>
  {
  mmt( const std::string& s ):
    std::pair <char, string> ( s[ 0 ], s )
    { }
  };

wordlist.insert( std::istream_iterator <mmt> ( f ), std::istream_iterator <mmt> () );

I still think you are doing this the hard way.
Jan 10, 2011 at 11:11pm
Not anymore, I changed over to a set - a little less complicated during runtime and the code is easier to read.
But it was still on my mind if you could do the same istream_iterator for map.
Thanks for the help though. it has been enlightening :)
Jan 12, 2011 at 1:27pm
Its a bit late, but still..

I will need to do some reading to understand the code from Duoas.

The code below may work for you. You can substitute the array with vector easily.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include ...

using namespace std;

template <typename T, size_t S>
inline size_t array_size(const T (&array)[S])
{
    return S;
} // mostly borrowed from another post

int main()
{
    static const char *strings[] = {
        "h",
        "he",
        "hel",
        "hell",
        "hello",
        "w",
        "wh",
        "whe",
        "wher",
        "where"
    };

    multimap <char, string> dict;

    struct Transformer {
        static pair<char, string> transform(const char *s)
        {
            return pair<char, string>(s[0], s);
        }
    };

    transform(&strings[0], &strings[array_size(strings)],
              inserter(dict, dict.begin()),
              Transformer::transform);

    struct Printer {
        static void print (const pair<char, string> &v)
        {
            cout << v.first << ' ' << v.second << endl;
        }
    };

    for_each(dict.begin(), dict.end(), Printer::print);
}
Last edited on Jan 12, 2011 at 9:35pm
Topic archived. No new replies allowed.