searching through massive list of words

Feb 13, 2012 at 6:39pm
i have this huge group of words, close to 45,000, that i need to be in some sort of data structure that i can access easily. i need to be able to sort through the group, by which letters are in the words,because i am doing a sort of text twist game...

should i use an array? that seems inefficient for searching through all those words.
Feb 13, 2012 at 7:34pm
Are the words unique? If they are you can use something like a std::set. You can use std::find_if or maybe find_first_of depending on what you are trying to do. http://www.cplusplus.com/reference/algorithm/find_if/
Feb 14, 2012 at 2:02am
Well basically i need to have some data structure containing all these words. The user is goimg to get 7 random letters, amd will have to find words with those letters, so i need to find some sort of way to search through the list pf words to see if what they inputted was a valid word from the list
Feb 14, 2012 at 4:08am
you can use a 'prefix tree' or a 'rb tree'
Feb 14, 2012 at 5:39am
So that would be faster then using a simple array?? Sorry im new to data structures, amd ive never worked with numbers this big
Feb 14, 2012 at 9:39am
a sorted tree would certainly faster than a simple array. But before you doing something complicated try somthing more simple like std::list http://www.cplusplus.com/reference/stl/list/
It has a build in sort wich is faster than an array.

a todays pc should be fast enough to deal with 45000 and more words (even with an array). just check it out
Last edited on Feb 14, 2012 at 12:00pm
Feb 14, 2012 at 11:31am
have a look at clucene search engine
Feb 14, 2012 at 5:32pm
thanks!! clucene is exactly what i was looking for, however, i cant find any documentation for it so i have no idea how to use it... do you have a link to somewhere where i could learn how to use it?
Feb 14, 2012 at 8:42pm
also coder777, i can i search through lists? i did not see a function for searching

thanks!
Feb 15, 2012 at 6:58am
Yes, you can do this with

http://www.cplusplus.com/reference/algorithm/find/

or

http://www.cplusplus.com/reference/algorithm/binary_search/

which is supposed to be faster.

It's like so:
1
2
3
4
5
6
7
std::list<std::string> l;

...

std::list<std::string>::iterator it = std::find(l.begin(), l.end(), "find_this");
if(it != l.end())
  // found it 
Feb 15, 2012 at 5:17pm
ok and this is kind of a silly question, but what would be the most efficient way of making a list if i have over 45,000 strings in it? i cant add them in individually because the project is due in a couple of days. should i use a for loop or something?
Feb 15, 2012 at 5:23pm
should i use a for loop or something?
Definitely use a loop...

1
2
3
4
5
6
7
//...
while(not end of file)
{
    //read line
    //put word in container...
}
//... 
Topic archived. No new replies allowed.