how i read file word by word

I tend to do something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <iostream>
#include <string>
#include <regex>
#include <vector>
#include <fstream>

std::vector<std::string> split_into_words( const std::string& text )
{
    static const std::regex word_re( "\\w+" ) ;

    std::vector<std::string> words ;

    std::sregex_iterator iter( text.begin(), text.end(), word_re ) ;
    const std::sregex_iterator end ;
    for( ; iter != end ; ++iter ) words.push_back( iter->str() ) ;

    return words ;
}

std::vector<std::string> get_words( std::istream& stm )
{
    std::vector<std::string> words ;

    std::string line ;
    while( std::getline( stm, line ) )
    {
        auto vec = split_into_words(line) ;
        for( auto& wrd : vec ) words.push_back( std::move(wrd) ) ;
    }

    return words ;
}

std::vector<std::string> get_words_in_file( const std::string& path )
{
    if( std::ifstream file{path} ) return get_words(file) ;
    else return {} ; // failed to open file. throw something?
}

int main() // test driver
{
    const std::string path_to_file = "test.txt" ;

    // create a test file and print out the words extracted from it
    std::ofstream(path_to_file) << "'Twas brillig, and the slithy-toves !@#$%^&*\n"
                                   "Did gyre! (and gimble) in the wabe:\n"
                                   "All \"mimsy\" were the borogoves,\n"
                                   "And the mome raths outgrabe.\n" ;

    std::cout<< "file contains:\n----------\n" << std::ifstream(path_to_file).rdbuf() << "\nwords:\n--------\n" ;
    for( const auto& word : get_words_in_file(path_to_file) ) std::cout << word << '\n' ;
}

http://coliru.stacked-crooked.com/a/a183bd19014db1b6
... and the OP disappears in a puff of smoke...
I'd hazard a guess the Wizards of The Lounge don't like C++ code intruding on their quiet glade of contemplation of "anything but talk about C++".
This is now an exercise to create the most optimized class for reading words from a file that you can. Go!

...anyone?

-Albatross
I'd need 15 feet of rubber tubing, 5 gallons of lubricant and a Yak to do it.
but first define what is a 'word'.
I usually just open the file with something like word or notepad and start reading it. I don't see why you need to program.
Last edited on
but first define what is a 'word'.

"In The Beginning..."
Without using regex and considering - as a splitter char between 2 words, then consider (without file open error detection):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <vector>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <cctype>
#include <ranges>

namespace rngs = std::ranges;

auto get_words(const std::string& fn) {
	static constexpr char split { '-' };
	std::ifstream ifs(fn);
	std::vector<std::string> words;

	for (std::string wrd; ifs >> wrd; ) {
		std::string w;

		rngs::copy_if(wrd, std::back_inserter(w), [](unsigned char ch) {return ch == split || std::isalpha(ch); });

		for (const auto& word : rngs::split_view(w, split))
			if (word.begin() != word.end())
				words.emplace_back(word.begin(), word.end());
	}

	return words;
}

int main() {
	const std::string path_to_file { "test.txt" };

	std::ofstream(path_to_file) << "'Twas brillig, and the slithy-toves !@#$%^&*\n"
		"Did gyre! (and gimble) in the wabe:\n"
		"All \"mimsy\" were the borogoves,\n"
		"And the mome raths outgrabe.\n";

	std::cout << "file contains:\n----------\n" << std::ifstream(path_to_file).rdbuf() << "\nwords:\n--------\n";

	for (const auto& w : get_words(path_to_file))
		std::cout << w << '\n';
}



file contains:
----------
'Twas brillig, and the slithy-toves !@#$%^&*
Did gyre! (and gimble) in the wabe:
All "mimsy" were the borogoves,
And the mome raths outgrabe.

words:
--------
Twas
brillig
and
the
slithy
toves
Did
gyre
and
gimble
in
the
wabe
All
mimsy
were
the
borogoves
And
the
mome
raths
outgrabe


https://wandbox.org/permlink/zc0f2MZeFvZeegiP
Last edited on
Topic archived. No new replies allowed.