split vector and store words

I am trying to read a text file and then store each line into a vector then split that vector and isolate each word to another vector. basically, I want to read a file then get the frequency for common words and their count without using maps.

string line;
vector<string> file_lines;
if(input_file.is_open())
{
while(getline(input_file, line))
{
for_each(line.begin(), line.end(), [](char & c){
c = ::tolower(c);});

file_lines.push_back(line);
}
}
else cout << "unable to open file";
PLEASE learn to use code tags, they make reading and commenting on source code MUCH easier.

http://www.cplusplus.com/articles/jEywvCM9/
http://www.cplusplus.com/articles/z13hAqkS/

HINT: you can edit your post and add code tags.

Some formatting & indentation would not hurt either

With that said, here's ONE way of tokenizing a bunch of text into individual words for later analysis (with simulated file read):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <vector>
#include <string>
#include <sstream>

int main()
{
   // create a simulated file with 3 lines of text, 20 words total
   std::vector<std::string> read_file { "Now is the time for all good men",
                                        "Four Score and Seven Years Ago",
                                        "Split Entire Lines Into Individual Words"};

   // blank vector to hold the individual word tokens
   std::vector<std::string> words;

   // loop through all the 'read' lines to tokenize each one
   for (const auto& itr : read_file)
   {
      // create a stringstream for using stream operators to read each line
      std::istringstream line(itr);
      
      // a string to hold each read word
      std::string word;

      // loop, reading each individual word
      while (line >> word)
      {
         words.push_back(word);
      }
   }

   // verify the correct number of words were tokenized (20, right?)
   std::cout << words.size() << '\n';
}
The main issue is what constitutes a 'word'. Is dog the same as Dog the same as dog. the same as dog! the same as "dog etc etc etc. Do you need to remove all punct etc from the word and make into say lowercase for the purposes of counting frequency?

getline() will obtain one line from the line and using std::istringstream and extraction (>>) each white-space delimited token can be extracted (as per George's post above). The issue is then what to be with these tokens...

How are you going to get the frequency count from these vector's of 'words'? Why not use std::map? It's made for something like this.
Topic archived. No new replies allowed.