Binary search of a dynamic array of strings

Hey yall, so for school I have to make a recursive word ladder program, and I wanna put a binary search in it. I have a function that checks the inputted word in the dictionary of words and returns a bool. I had it as an iterative function but going through a 187000-word dynamic array just to find 1 isn't very fast. I've been trying to make the binary search work but it just doesn't, I know its something with the word compares but this is probably version 10 of my function and it still sucks. any help or tips would be extremely appreciated. Thank you!

global_count is just a tracker for dev stuff, WORDCOUNT is 187720 which is the number of words in the dynamic array wordsArray.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
bool validWord(string word, string *&wordsArray)
{
    cout << "running valid word: " << global_count << endl;
    ++global_count;

    int mid;
    int low = 0;
    int high = WORDCOUNT;

    while (low <= high)
    {
        // find the middle index
        mid = low + ((high - low) >> 1);

        if (word == wordsArray[mid]) // word found
        {
            return true;
        }
        else if (word > wordsArray[mid]) // word may be on the left half
        {
            high = mid - 1;
        }
        else // word may be on the right half
        {
            low = mid + 1;
        }
    }

    // key not found
    return false;
}
This is a classic bisecting search.
Try this (not tested):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <string>
constexpr int WORDCOUNT = 187720;

int validWord(const std::string word, const std::string*& wordsArray)
{
    int mid;
    int low = 0;
    int high = WORDCOUNT;
    int iterations = 0; 

    mid = low + ((high - low) >> 1);    //  Initial midpoint
    while (low != high)
    {
        iterations++;
        if (word == wordsArray[mid]) // word found
        {
            return iterations; 
        }
        else if (word > wordsArray[mid]) // word may be in second half 
        {
            low = mid;      //  Can't be lower than old midpoint
            mid = low + ((high - low) >> 1);    //  New midpoint
        }
        else // word may be in first half
        {
            high = mid;     //  Can't be higher than old midpoint
            mid = low + ((high - low) >> 1);    //  New midpoint
        }

    }
    // key not found
    return 0;
}

Note: It returns the number of iterations required.
Last edited on
OP was very close to a working solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
bool validWord2(string word, string *&wordsArray)
{
    int mid;
    int low = 0;
    // subtract one to avoid ever accessing wordsArray[WORDCOUNT]
    int high = WORDCOUNT - 1; 

    while (low <= high)
    {
        // find the middle index
        mid = low + ((high - low) >> 1);

        if (word == wordsArray[mid]) // word found
        {
            return true;
        }
        // Change greater-than to less-than
        else if (word < wordsArray[mid]) // word may be on the left half
        {
            high = mid - 1;
        }
        else // word may be on the right half
        {
            low = mid + 1;
        }
    }

    // key not found
    return false;
}


Alternatively, here's a complete example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <string>
#include <iostream>
#include <compare>

int constexpr WORDCOUNT = 26;

using namespace std;
bool validWord(string const&  word, string const *wordsArray)
{
  int begin = 0, end = WORDCOUNT - 1;

  while (begin <= end)
  {
    auto const middle = begin + (end - begin) / 2;
    auto const result = word <=> wordsArray[middle];
    if (result == 0) { return true; }
    if (result >  0) { begin = middle + 1; }
    if (result <  0) { end = middle - 1; }
  }
  
  return false;
}

int main(int argc, char** argv)
{
  string phonemes[] = 
  {
    "alpha"  , "bravo"   , "charlie", "delta",
    "echo"   , "foxtrot" , "golf"   , "hotel",
    "indigo" , "juliet"  , "kilo"   , "lima" ,
    "mike"   , "november", "oscar"  , "papa" ,
    "quebec" , "romeo"   , "sierra" , "tango",
    "uniform", "victor"  , "whiskey", "xray" ,
    "yankee" , "zulu"
  };    
  static_assert(size(phonemes) == WORDCOUNT);
  
  for (int i = 1; i < argc; ++i)
  {
    std::cout << "\"" << argv[i] << "\" is "
              << (validWord(argv[i], phonemes)? "": "not ")
              << "a NATO code word\n";
  }
}

Demo:
https://coliru.stacked-crooked.com/a/ceef5d8550cae8db

Finally there is std::binary_search already
https://en.cppreference.com/w/cpp/algorithm/binary_search
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>
#include <algorithm>
#include <vector>
 
int main()
{
    std::vector<std::string> dico { "Cat", "Dog", "Fish", "Snake", "Bird" };
    std::vector<std::string> words { "Fish", "Dog", "Scorpion" };
 
    for (auto word : words) 
    {
        std::cout << "Searching for " << word << '\n';
        
        if (std::binary_search(dico.begin(), dico.end(), word))
            std::cout << "Found the word " << word << '\n';
        else 
            std::cout << word << " does not exist in the dictionary!\n";
    }
}
This alternative using binary_search could be useful for you. It loads more than 370K words in less than 1 seconds (according to the specs of my new PC) and it can find a word in the vector words in a few microseconds. For this test, I use a text file which is available at the address below. Seems good enough ++

v2 : I added some perf measurements...

https://github.com/dwyl/english-words/blob/master/words_alpha.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <chrono>

int main()
{
    std::vector<std::string> words;
    std::ifstream infile("dictionary.txt");
    std::string newWord;

    if (!infile.is_open())
    {
        std::cout << "Could not open the dictionary!" << std::endl;
        exit(EXIT_FAILURE);
    }
    // for perf
    auto start = std::chrono::high_resolution_clock::now();
    // create your vector with all words from your dictionary
    while (std::getline(infile, newWord))
    {
        transform(newWord.begin(), newWord.end(), newWord.begin(), ::tolower);
        words.push_back(newWord);
    }

    infile.close();
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Our dictionary with " << words.size() + 1 << " words has been read in " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;

    while (1) {
        std::string input;
        std::cout << "Give me a word : " << input;
        std::cin >> input;

        transform(input.begin(), input.end(), input.begin(), ::tolower);
        if (input == "exit") break;

        auto start = std::chrono::high_resolution_clock::now();

        if (std::binary_search(words.begin(), words.end(), input))
            std::cout << "The word " << input << " has been found :)" << std::endl;
        else
            std::cout << input << " does not exist in the dictionary!" << std::endl;

        auto end = std::chrono::high_resolution_clock::now();
        std::cout << "Operation done in " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << " ns" << std::endl;
    };

    return 0;
}


Our dictionary with 370106 words has been read in 998 ms
Give me a word : albatroos
albatroos does not exist in the dictionary!
Operation done in 94600 ns
Give me a word : albatros
The word albatros has been found :)
Operation done in 176900 ns
Last edited on
It load more 350K words in less than 1 seconds


Take a look at this thread, which talks a little about read performance and has a little measurements. It's the same thing, reading a dictionary file:
https://cplusplus.com/forum/general/280508/
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <concepts>
#include <ranges>
#include <algorithm>
#include <string>
#include <vector>

template < std::ranges::forward_range RANGE, typename T,
           std::indirect_strict_weak_order< T*, std::ranges::iterator_t<RANGE> > CMP = std::ranges::less >
std::ranges::iterator_t<RANGE> binary_find( RANGE&& range, const T& value, CMP cmp = {} ) // TO DO: add support for projection
{
    const auto iter = std::ranges::lower_bound( range, value, cmp ) ;
    if( iter != std::ranges::end(range) && !cmp( value, *iter ) ) return iter ;
    else return std::ranges::end(range) ;
}

int main()
{
    std::vector<std::string> dictionary { "Cat", "Dog", "Fish", "Cow", "Crocodile", "Snake", "Bird" };
    std::ranges::sort(dictionary) ;
    for( std::size_t i = 0 ; i < dictionary.size() ; ++i ) std::cout << i << ". " << dictionary[i] << '\n' ;
    std::cout << "\n----------------\n\n" ;

    for( const char* const word : { "Fish", "Snake", "Scorpion", "Bird", "Crocodile" } )
    {
        if( auto iter = binary_find( dictionary, word ) ; iter != dictionary.end() )
            std::cout << "Found the word '" << word << "' at position " << iter - dictionary.begin() << '\n';
        else
            std::cout << "word '" << word << "' does not exist in the dictionary!\n";
    }
}

https://coliru.stacked-crooked.com/a/e11ce4ec6a220393
Why hold a dictionary in a std::vector and not a std::set? Consider (as C++20):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <set>
#include <iostream>
#include <string>
#include <iomanip>

int main() {
	const std::set<std::string> dictionary { "Cat", "Dog", "Fish", "Cow", "Crocodile", "Snake", "Bird" };

	for (size_t i {}; const auto& d : dictionary)
		std::cout << ++i << ". " << d << '\n';

	std::cout << "\n----------------\n\n";

	for (const char* const word : { "Fish", "Snake", "Scorpion", "Bird", "Crocodile" }) {
		if (dictionary.contains(word))
			std::cout << "Found the word " << std::quoted(word) << '\n';
		else
			std::cout << "word " << std::quoted(word) << " does not exist in the dictionary!\n";
	}
}



1. Bird
2. Cat
3. Cow
4. Crocodile
5. Dog
6. Fish
7. Snake

----------------

Found the word "Fish"
Found the word "Snake"
word "Scorpion" does not exist in the dictionary!
Found the word "Bird"
Found the word "Crocodile"

@Geckoo - For the search, you are also timing the cout statement! Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <set>
#include <chrono>
#include <iomanip>
#include <cctype>

struct comma_facet : public std::numpunct<char> {
	explicit comma_facet(char del, uint8_t group) :g(group), delim(del) {}
	virtual char do_thousands_sep() const {
		return delim;
	}
	virtual std::string do_grouping() const {
		return std::string(1, g);
	}
	uint8_t g {};
	char delim {};
};

struct set000s {
public:
	explicit set000s(char del = ',', uint8_t grp = 3) : group(grp), delim(del) {}

private:
	uint8_t group {};
	char delim {};

	friend std::ostream& operator<<(std::ostream& os, const set000s& grp) {
		os.imbue(std::locale(os.getloc(), new comma_facet(grp.delim, grp.group)));
		return os;
	}
};

int main() {
	const auto lc { [](unsigned char ch) {return static_cast<char>(std::tolower(ch)); } };
	std::set<std::string> words;

	if (std::ifstream infile { "words1.txt" }) {
		const auto start { std::chrono::high_resolution_clock::now() };

		for (std::string newWord; std::getline(infile, newWord); words.insert(newWord))
			std::transform(newWord.begin(), newWord.end(), newWord.begin(), lc);

		const auto end { std::chrono::high_resolution_clock::now() };

		std::cout << "Our dictionary with " << set000s() << words.size() << " words has been read in " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
	} else
		return (std::cout << "Could not open the dictionary!\n"), EXIT_FAILURE;

	while (true) {
		std::string input;

		std::cout << "Give me a word : " << input;
		std::cin >> input;
		std::transform(input.begin(), input.end(), input.begin(), lc);

		if (input == "exit")
			break;

		const auto start { std::chrono::high_resolution_clock::now() };
		const auto fnd { words.contains(input) };
		const auto end { std::chrono::high_resolution_clock::now() };

		if (fnd)
			std::cout << "The word " << std::quoted(input) << " has been found :)\n";
		else
			std::cout << std::quoted(input) << " does not exist in the dictionary!\n";

		std::cout << "Operation done in " << set000s() << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << " ns\n";
	}
}


which for my laptop gives these timings:


Our dictionary with 466,546 words has been read in 298 ms
Give me a word : albatros
The word "albatros" has been found :)
Operation done in 2,371 ns
Give me a word : albatroos
"albatroos" does not exist in the dictionary!
Operation done in 2,766 ns
Give me a word : zebra
The word "zebra" has been found :)
Operation done in 3,952 ns
Give me a word : zebar
"zebar" does not exist in the dictionary!
Operation done in 3,556 ns
Give me a word : exit

Last edited on
> Why hold a dictionary in a std::vector and not a std::set?

For a large stable dictionary, searching in a sorted vector may be (typically would be) somewhat more efficient.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <ranges>
#include <algorithm>
#include <string>
#include <vector>
#include <set>
#include <random>
#include <ctime>


int main()
{
    const std::size_t N = 2'000'000 ;
    const std::string prefix = "ABCDEFGH" ;

    std::vector<std::string> test_data ;
    for( std::size_t i = 0 ; i < N ; ++i )
        test_data.push_back( prefix + std::to_string(i) ) ;

    std::vector<std::string> vector = test_data ;
    std::ranges::sort(vector) ;

    std::set<std::string> set ;
    for( auto& str : test_data ) set.insert(str) ;

    const std::size_t INC = 2 ;
    std::ranges::shuffle( test_data, std::mt19937( std::random_device{}() ) ) ;

    {
        const auto start = std::clock() ;
        volatile int cnt = 0 ;
        for( std::size_t i = 0 ; i < N ; i+= INC ) cnt = cnt + std::ranges::binary_search( vector, test_data[i] ) ;
        const auto end = std::clock() ;
        std::cout << std::fixed << "vector: " << double(end-start)  / CLOCKS_PER_SEC << " seconds\n" ;
    }

    {
        const auto start = std::clock() ;
        volatile int cnt = 0 ;
        for( std::size_t i = 0 ; i < N ; i+= INC ) cnt = cnt + ( set.find( test_data[i] ) != set.end() ) ;
        const auto end = std::clock() ;
        std::cout << std::fixed << "   set: " << double(end-start)  / CLOCKS_PER_SEC << " seconds\n" ;
    }
}


Typical run on coliru:
vector: 1.141108 seconds
   set: 1.425130 seconds

https://coliru.stacked-crooked.com/a/8a873387396c95f3
Yes. It's a good alternative too. However I have lower perf reading the dictionary file. I guess that the ordered process costs some resources. Obviously there are many ways to create such kind of prog ++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <string>
#include <fstream>
#include <set>
#include <iostream>
#include <string>
#include <chrono>

int main()
{
    std::set<std::string> words;
    std::ifstream infile("dictionary.txt");
    std::string newWord;

    if (!infile.is_open())
    {
        std::cout << "Could not open the dictionary!" << std::endl;
        exit(EXIT_FAILURE);
    }
    // for perf
    auto start = std::chrono::high_resolution_clock::now();
    // create your vector with all words from your dictionary
    while (std::getline(infile, newWord))
    {
        transform(newWord.begin(), newWord.end(), newWord.begin(), ::tolower);
        words.insert(newWord);
    }

    infile.close();
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Our dictionary with " << words.size() + 1 << " words has been read in " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;

    while (1) {
        std::string input;
        std::cout << "Give me a word : " << input;
        std::cin >> input;

        transform(input.begin(), input.end(), input.begin(), ::tolower);
        if (input == "exit") break;

        auto start = std::chrono::high_resolution_clock::now();

        words.contains(input) ? std::cout << "The word " << input << " has been found :)" << std::endl
                              : std::cout << input << " does not exist in the dictionary!" << std::endl;

        auto end = std::chrono::high_resolution_clock::now();
        std::cout << "Operation done in " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << " ns" << std::endl;
    };

    return 0;
}



Our dictionary with 370106 words has been read in 2166 ms
Give me a word : albatroos
albatroos does not exist in the dictionary!
Operation done in 213500 ns
Give me a word : albatros
The word albatros has been found :)
Operation done in 200200 ns
Last edited on
I am very impressed by the perf of your laptop.
Which CPU you have inside?

---------------------------------------------
A few minutes after my previous question :/

Forgive me. I am stupid. Compiling as release - I have better perf of course...


Our dictionary with 370106 words has been read in 116 ms
Give me a word : albatroos
albatroos does not exist in the dictionary!
Operation done in 146000 ns
Give me a word : albatros
The word albatros has been found :)
Operation done in 137500 ns
Last edited on
Which CPU you have inside?


Intel dual core I7 2.6 GHz

but it's just a standard rotating disk - not SSD! :cry:

Try taking the cout statements outside of the search measurement and you'll see even better performance for the search - as std::cout is really very slow...
> However I have lower perf reading the dictionary file.

The vector performance would be an order of magnitude faster than that of the set if the input is already sorted. Otherwise, we would have to sort the vector, and there may not be a huge difference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream>
#include <ranges>
#include <algorithm>
#include <string>
#include <vector>
#include <set>
#include <random>
#include <ctime>

struct timer
{
    explicit timer( std::string label = {} ) : label( std::move(label) ) {}
    ~timer()
    {
        const auto end = std::clock() ;
        std::cout << std::fixed << label << ": " << double(end-start)  / CLOCKS_PER_SEC << " seconds\n" ;
    }
    const std::clock_t start = std::clock() ;
    const std::string label ;
};

int main()
{
    const std::size_t N = 2'000'000 ;
    const std::string prefix = "ABCDEFGH" ;

    std::vector<std::string> test_data ;
    for( std::size_t i = 0 ; i < N ; ++i )
        test_data.push_back( prefix + std::to_string(i) ) ;

    std::vector<std::string> vector ;
    {
        timer t( "vector insert and sort" ) ;
        vector = test_data ;
        std::ranges::sort(vector) ;
    }

    std::set<std::string> set ;
    {
        timer t( "            set insert" ) ;
        for( auto& str : test_data ) set.insert(str) ;
    }

    const std::size_t INC = 2 ;
    std::ranges::shuffle( test_data, std::mt19937( std::random_device{}() ) ) ;

    {
        timer t( "         vector search" ) ;
        volatile int cnt = 0 ;
        for( std::size_t i = 0 ; i < N ; i+= INC ) cnt = cnt + std::ranges::binary_search( vector, test_data[i] ) ;
    }

    {
        timer t( "            set search" ) ;
        volatile int cnt = 0 ;
        for( std::size_t i = 0 ; i < N ; i+= INC ) cnt = cnt + ( set.find( test_data[i] ) != set.end() ) ;
    }
}


On a typical run on coliru:
vector insert and sort: 1.014847 seconds
            set insert: 1.175835 seconds
         vector search: 1.085210 seconds
            set search: 1.408074 seconds

https://coliru.stacked-crooked.com/a/23f4b94e513c3941
For a large stable dictionary, searching in a sorted vector may be (typically would be) somewhat more efficient.


Ah yes. vector is continuous memory and set is a tree (black/red) constructed from linked memory nodes. I now seem to recall a similar discussion a while ago. But my memory isn't what it was... a similar discussion a while ago. But my memory isn't what it was...

Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <set>
#include <vector>
#include <chrono>
#include <iomanip>
#include <cctype>

int main() {
	const auto lc { [](unsigned char ch) {return static_cast<char>(std::tolower(ch)); } };
	std::set<std::string> words;
	std::vector<std::string> vs;

	if (std::ifstream infile { "words1.txt" }) {
		const auto start { std::chrono::high_resolution_clock::now() };

		for (std::string newWord; std::getline(infile, newWord); words.insert(newWord), vs.push_back(newWord))
			std::transform(newWord.begin(), newWord.end(), newWord.begin(), lc);

		std::ranges::sort(vs);

		const auto end { std::chrono::high_resolution_clock::now() };

		std::cout << "Our dictionary with " << set000s() << words.size() << " words has been read in " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
	} else
		return (std::cout << "Could not open the dictionary!\n"), EXIT_FAILURE;

	while (true) {
		std::string input;

		std::cout << "Give me a word : " << input;
		std::cin >> input;
		std::transform(input.begin(), input.end(), input.begin(), lc);

		if (input == "exit")
			break;

		const auto start { std::chrono::high_resolution_clock::now() };
		const auto fnd { words.contains(input) };
		const auto end { std::chrono::high_resolution_clock::now() };

		const auto start1 { std::chrono::high_resolution_clock::now() };
		const auto fnd1 { std::ranges::binary_search(vs, input)};
		const auto end1 { std::chrono::high_resolution_clock::now() };
		if (fnd && fnd1)
			std::cout << "The word " << std::quoted(input) << " has been found :)\n";
		else
			std::cout << std::quoted(input) << " does not exist in the dictionary!\n";

		std::cout << "Set operation done in " << set000s() << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << " ns\n";
		std::cout << "Vector operation done in " << set000s() << std::chrono::duration_cast<std::chrono::nanoseconds>(end1 - start1).count() << " ns\n";
	}
}



Our dictionary with 466,546 words has been read in 466 ms
Give me a word : albators
"albators" does not exist in the dictionary!
Set operation done in 2,371 ns
Vector operation done in 1,580 ns
Give me a word : zebra
The word "zebra" has been found :)
Set operation done in 4,346 ns
Vector operation done in 1,976 ns
Give me a word : albatroos
"albatroos" does not exist in the dictionary!
Set operation done in 4,346 ns
Vector operation done in 1,976 ns
Give me a word : qwerty
The word "qwerty" has been found :)
Set operation done in 3,556 ns
Vector operation done in 1,580 ns
Give me a word : exit

Last edited on
I've been trying to make the binary search work but it just doesn't, I know its something with the word compares but this is probably version 10 of my function and it still sucks


@dorito200 - Following the interlude, in cases like this use the debugger to trace through the code to see what's happening. You can step through code, set breakpoints and watch the contents of variables. You can then see where the code execution is deviating from what is expected. In this case L19.

Note that you don't need an else clause if the final statement of the if clause is a return.

Try an unordered_set.
:O)
OK. We have a winner! Obviously that's what I meant to say when I said set before.... :) :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <unordered_set>
#include <vector>
#include <chrono>
#include <iomanip>
#include <cctype>

struct comma_facet : public std::numpunct<char> {
	explicit comma_facet(char del, uint8_t group) :g(group), delim(del) {}
	virtual char do_thousands_sep() const {
		return delim;
	}
	virtual std::string do_grouping() const {
		return std::string(1, g);
	}
	uint8_t g {};
	char delim {};
};

struct set000s {
public:
	explicit set000s(char del = ',', uint8_t grp = 3) : group(grp), delim(del) {}

private:
	uint8_t group {};
	char delim {};

	friend std::ostream& operator<<(std::ostream& os, const set000s& grp) {
		os.imbue(std::locale(os.getloc(), new comma_facet(grp.delim, grp.group)));
		return os;
	}
};

int main() {
	const auto lc { [](unsigned char ch) {return static_cast<char>(std::tolower(ch)); } };
	std::unordered_set<std::string> words;
	std::vector<std::string> vs;

	if (std::ifstream infile { "words1.txt" }) {
		const auto start { std::chrono::high_resolution_clock::now() };

		for (std::string newWord; std::getline(infile, newWord); words.insert(newWord), vs.push_back(newWord))
			std::transform(newWord.begin(), newWord.end(), newWord.begin(), lc);

		std::ranges::sort(vs);

		const auto end { std::chrono::high_resolution_clock::now() };

		std::cout << "Our dictionary with " << set000s() << words.size() << " words has been read in " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
	} else
		return (std::cout << "Could not open the dictionary!\n"), EXIT_FAILURE;

	while (true) {
		std::string input;

		std::cout << "Give me a word : " << input;
		std::cin >> input;
		std::transform(input.begin(), input.end(), input.begin(), lc);

		if (input == "exit")
			break;

		const auto start { std::chrono::high_resolution_clock::now() };
		const auto fnd { words.contains(input) };
		const auto end { std::chrono::high_resolution_clock::now() };

		const auto start1 { std::chrono::high_resolution_clock::now() };
		const auto fnd1 { std::ranges::binary_search(vs, input) };
		const auto end1 { std::chrono::high_resolution_clock::now() };
		if (fnd && fnd1)
			std::cout << "The word " << std::quoted(input) << " has been found :)\n";
		else
			std::cout << std::quoted(input) << " does not exist in the dictionary!\n";

		std::cout << "Set operation done in " << set000s() << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << " ns\n";
		std::cout << "Vector operation done in " << set000s() << std::chrono::duration_cast<std::chrono::nanoseconds>(end1 - start1).count() << " ns\n";
	}
}



Our dictionary with 466,546 words has been read in 404 ms
Give me a word : albatros
The word "albatros" has been found :)
Set operation done in 395 ns
Vector operation done in 1,976 ns
Give me a word : zebra
The word "zebra" has been found :)
Set operation done in 1,581 ns
Vector operation done in 1,580 ns
Give me a word : qwerty
The word "qwerty" has been found :)
Set operation done in 790 ns
Vector operation done in 1,975 ns
Give me a word : albatroos
"albatroos" does not exist in the dictionary!
Set operation done in 790 ns
Vector operation done in 2,371 ns
Give me a word : albators
"albators" does not exist in the dictionary!
Set operation done in 396 ns
Vector operation done in 1,975 ns
Give me a word : exit

Topic archived. No new replies allowed.