Why not put the entire line in a string stream |
You will have to do this eventually but also need a way to deal with characters like period, parentheses, ampersands, exclamation marks, &c. Unlike in the OP's example, I'd not count 'expectations!' as a word but rather 'expectations', it standardizes the word for comparison with any further occurrences of 'expectations'.
In the following code, the my_ctype class (derived from std::ctype<char>) deals with this. There are, more than my usual, comments in the body of the program on how my_ctype deals with this but essentialy a my_ctype object is used in the construction of an augmented locale (here, x / reference: std::locale) that is then passed to the stringstream (here, stream) via the imbue() method. This provides a user-defined array of characters to use as delimiters. Then we save the results in a vector<string> and construct the map<string,int> from the vector. If anything is unclear after read and research do come back:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
|
#include <locale>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <fstream>
#include <sstream>
#include<string>
#include <vector>
#include<map>
using namespace std;
//From cppreference.com (mostly): Class ctype encapsulates character classification features. All stream input operations performed
//through std::basic_istream<charT> use the std::ctype<charT> of the locale imbued in the stream to identify whitespace characters
//for input tokenization. A locale, in turn, includes a ctype facet that classifies character types. Such a facet, incorporating
//further characters, could be as follows:
class my_ctype : public ctype<char>{
private:
mask my_table[table_size]; //unspecified bitmask type;
public:
my_ctype(size_t refs = 0) : std::ctype<char>(&my_table[0], false, refs){
copy_n(classic_table(), table_size, my_table);
my_table['-'] = (mask)space; //casts the delimiters to space;
my_table['\''] = (mask)space;
my_table['('] = (mask)space;
my_table[')'] = (mask)space;
my_table['!'] = (mask)space;
my_table[','] = (mask)space;
my_table['/'] = (mask)space;
my_table['.'] = (mask)space;
my_table['%'] = (mask)space;//sample array; can be expanded/modified depending on type of delimiters being handled;
}
};
int main(){
fstream File;
vector<string>v;
File.open("F:\\test.txt");
if(File.is_open()){//no error-handling here, OP to consider;
while(!File.eof()){
string line;
getline(File, line);
stringstream stream(line);
locale x(locale::classic(), new my_ctype);
//locale ctor using the classic() and the my_ctype facet; locale destructor deletes the raw pointer to my_ctype;
stream.imbue(x);//imbue sets the locale of the stream object;
copy(istream_iterator<string>(stream),istream_iterator<string>(),back_inserter(v));
//copies all elements in the range into the vector<string>;
//derived, stringstream class, uses istream iterator;
// std::ostream_iterator<std::string>(std::cout, "\n")//in case you want to print to screen;
}
}
map<string, int> m;
for(auto& itr: v){//creating the map with the vector elements;
++m[itr];
}
for(auto& itr: m){
cout<<itr.first<<" : "<<itr.second<<"\n";//printing the map;
}
}
|
Sample text
Investors have been feeling better about Apple Inc. these days. The company's stock price had climbed 31 percent since closing at a two-year low in May. Expectations changed to reflect the realization that Apple's go-go days are behind it, at least for now. Apple's less-supercharged new reality felt ... fine.
Apple shares have rebounded 31% since a May low as investors grew more comfortable with the company's slowing-to-negative revenue growth.
CEO Tim Cook declined to comment on recent reports that the Apple Watch production would be discontinued shortly on falling demand.
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
|
31 : 2
Apple : 5
CEO : 1
Cook : 1
Expectations : 1
Inc : 1
Investors : 1
May : 2
The : 1
Tim : 1
Watch : 1
a : 2
about : 1
are : 1
as : 1
at : 2
be : 1
been : 1
behind : 1
better : 1
changed : 1
climbed : 1
closing : 1
comfortable : 1
comment : 1
company : 2
days : 2
declined : 1
demand : 1
discontinued : 1
falling : 1
feeling : 1
felt : 1
fine : 1
for : 1
go : 2
grew : 1
growth : 1
had : 1
have : 2
in : 1
investors : 1
it : 1
least : 1
less : 1
low : 2
more : 1
negative : 1
new : 1
now : 1
on : 2
percent : 1
price : 1
production : 1
reality : 1
realization : 1
rebounded : 1
recent : 1
reflect : 1
reports : 1
revenue : 1
s : 4
shares : 1
shortly : 1
since : 2
slowing : 1
stock : 1
supercharged : 1
that : 2
the : 3
these : 1
to : 3
two : 1
with : 1
would : 1
year : 1
|
PS: there's more that can be done like case insensitivizing the vector of strings so that 'At' and 'at' are read as the same word, etc but here I am not focusing on this aspect