Reading words from a file
Mar 3, 2013 at 3:40am UTC
I'm trying extract words from a text file and then put then them into a set. I want to delimit all the spaces as well as the periods. My code is working fine for spaces but its leaving out the periods at the end of some words. Can someone help me out?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
void Dict::get_words(string file)
{
ifstream file(file);
int end, beginning = 0;
string word;
string line;
ifstream file("file.txt" );
if (file.is_open())
{
while (! file.eof())
{
getline(file, line);
end = line.find(" " );
word = line.substr(beginning, end);
words.insert(word);
beginning = end + 1;
}
}
file.close();
} //get_words
Last edited on Mar 3, 2013 at 4:27am UTC
Mar 3, 2013 at 4:49am UTC
how are even the spaces being managed correctly? i think you can try something like the following. Disclaimer: i'm a newbie myself and the code is just an attempt to solve your problem and in the process learn a bit myself.
1 2 3 4 5 6 7 8 9 10 11
while (! file.eof()) {
getline(file,line,"." );
int beginning = 0;
do {
end = line.find(" " );
word = line.substr(beginning,end);
words.insert(word);
beginning = end + 1;
} while (end < line.length()) ;
}
Mar 3, 2013 at 5:23am UTC
There are several ways.
Personal favorite is to redefine "whitespace" to include periods (because C++ streams are that tunable):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
#include <iostream>
#include <fstream>
#include <locale>
#include <sstream>
#include <set>
#include <vector>
#include <string>
struct period_ws: std::ctype<char > {
static const mask* make_table()
{
static std::vector<mask> v(classic_table(), classic_table() + table_size);
v['.' ] |= space; // period will be classified as whitespace
return &v[0];
}
period_ws(std::size_t refs = 0) : ctype(make_table(), false , refs) {}
};
int main()
{
std::ifstream f("test.txt" );
f.imbue(std::locale(f.getloc(), new period_ws()));
std::set<std::string> words;
std::string word;
while (f >> word) // just forget "file.eof()" exists
words.insert(word);
for (auto & s: words)
std::cout << "'" << s << '\'' << '\n' ;
}
You could also use boost.tokenizer, which skips periods by default:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <boost/tokenizer.hpp>
int main()
{
std::ifstream f("test.txt" );
std::istreambuf_iterator<char > beg(f), end;
std::string total(beg, end);
boost::tokenizer<> tok(total);
std::set<std::string> words(tok.begin(), tok.end());
for (auto & s: words)
std::cout << "'" << s << '\'' << '\n' ;
}
Or, you could follow your approach: read line by line... Again, several ways, here's a simple one:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <sstream>
#include <algorithm>
int main()
{
std::ifstream f("test.txt" );
std::set<std::string> words;
std::string line;
while (getline(f, line))
{
std::replace(line.begin(), line.end(), '.' , ' ' ); // replace periods with spaces
std::istringstream buf(line); // then use standard parse
std::string word;
while (buf >> word)
words.insert(word);
}
for (auto & s: words)
std::cout << "'" << s << '\'' << '\n' ;
}
Last edited on Mar 3, 2013 at 5:29am UTC
Mar 4, 2013 at 9:25pm UTC
I'm using this code to extract words from a file. The program works but it only gets words from the first line. I can't get it to grab words from every single line, and until EOF. I cant figure it out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
#include <iostream>
#include <fstream>
#include <string>
#include <set>
#include <algorithm>
#include <cctype>
#include <sstream>
using namespace std;
int main()
{
string temp, sentence;
stringstream iss;
set <string> sentences;
ifstream file("thisfile.txt" );
if (file.is_open())
{
while (!file.eof())
{
while (getline(file, temp, '.' ))
{
iss << temp;
while (getline(iss, sentence, ' ' ))
{
sentences.insert(sentence);
}
}
}
}
file.close(); //close file
for (set<string>::const_iterator it = sentences.begin();
it != sentences.end(); it++)
{
cout << *it << endl;
}
return 0;
} // get_sentences
Topic archived. No new replies allowed.