I should implement a function that reads in a large text file, counts how many times each word occurs and prints only the words that occur more often than the “threshold” parameter and their occurrence count, one name-count per line. The higher the threshold parameter, the less words will meet it and be printed. Threshold 0 will print all the words. (We are not worried about punctuation and such things here. Consider as a separator white space (i.e. so you can use >> without doing anything more). However, letter case should be ignored. e.g.: The fragment “This IS TRUE. THIS is not.” has these words: “this” x 2, “is” x 2, “true.” and “not.” I must implement this function without using a map. I must use only 1 vector of strings for this function and nothing else. I must use iterators to work with the vector and not array index notation. To turn a word into all lowercase I can use the STL transform
transform(word.begin(), word.end(), word.begin(), ::tolower);
* Print a word and the number of times it occurs only if it occurs
* more often than the given threshold
*/
void printIf(string word, int occurrences, int threshold) {
if (occurrences > threshold) {
cout << word << " - " << occurrences << endl;
}
}
void printCommonWords1(string& filename, int threshold) {
// your code here
// you MUST use ONLY ONE VECTOR OF STRING (NO STRUCTS) to store the data, and NO MAP OR ANYTHING ELSE.
// you MUST use ITERATORS to access the data in the vector and NOT [index] NOTATION
// to print any values call the function above
vector<string> myWords;
vector<string>::iterator it;
while (cin >> filename) //no idea if this is correct
myWords.push_back(filename);
//should I use transform with iterators?
//how do i get the count for each word?
for (it = myWords.begin(); it != myWords.end(); it++) {
printIf(*it, count, threshold);
}
}
First step is to read all the words from the file , convert them into lowercase and store the words in the vector.
Forget everything else for the moment until you have done it.
void printCommonWords1(string& filename, int threshold) {
string word;
int* wordCountArray;
vector<string> myWords;
vector<string>::iterator it;
ifstream myFile(filename);
if (!myFile)
cout << "Could not open file" << endl;
while (myFile >> word)
myWords.push_back(word);
transform(myWords.begin(), myWords.end(), myWords.begin(), ::tolower);
for (it = myWords.begin(); it != myWords.end(); it++) {
}
myFile.close();
}
But now I don't know how to keep the count for each word in the vector. Should I use a separate wordCountArray? I need to be able to pass the count for each word to printIf.
I think the myWords vector must contain different words. The way I did it stores every single word in the vector even if it is a word that has been stored previously. I am confused as hell.
// you MUST use ONLY ONE VECTOR OF STRING (NO STRUCTS) to store the data, and NO MAP OR ANYTHING ELSE.
One way I see is to count the word before you print it.
You could create a function int WordCount that counts the number of occurences in the vector.
Then for each word in the vector you get the count and if the count is greater than threshold you print it.
#include <iostream>
#include <string>
#include <cctype>
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iomanip>
std::string to_lower( std::string str )
{
for( char& c : str ) c = std::tolower(c) ;
return str ;
}
std::vector<std::string> get_words( std::string file_name )
{
std::vector<std::string> result ;
std::ifstream file(file_name) ;
std::string word ;
while( file >> word ) result.push_back( to_lower(word) ) ;
return result ;
}
void print_common_words( std::string file_name, int threshold )
{
std::vector<std::string> words = get_words(file_name) ;
auto iter = std::begin(words) ;
constauto end = std::end(words) ;
std::sort( iter, end ) ; // sort the vector to make repetitions appear next to each other
while( iter != end )
{
// http://en.cppreference.com/w/cpp/algorithm/upper_boundconstauto next_word = std::upper_bound( iter, end, *iter ) ;
// http://en.cppreference.com/w/cpp/iterator/distanceconstauto frequency = std::distance( iter, next_word ) ; // frequency of this word
if( frequency > threshold ) std::cout << std::quoted(*iter) << " x " << frequency << '\n' ;
iter = next_word ; // move to the next word
}
}
thank you, but that is way beyond my understanding.
my problem now is counting each word in the myWords vector and passing the word and its occurrence to printIf.
void print_common_words( std::string file_name, int threshold )
{
std::vector<std::string> words = get_words(file_name) ;
std::sort( words.begin(), words.end() ) ; // sort the vector to make repetitions appear next to each other
auto iter_current_word = words.begin() ;
while( iter_current_word != words.end() )
{
const std::string& this_word = *iter_current_word ;
// the vector is sorted; locate the next (different) word in the vector
auto iter_next_word = iter_current_word ;
while( iter_next_word != words.end() && *iter_next_word == this_word ) ++iter_next_word ;
constauto occurrences = iter_next_word - iter_current_word ; // occurrences of this word
printIf( this_word, occurrences, threshold ) ;
iter_current_word = iter_next_word ; // move to the next word
}
}
a bit easier. So, what is get_words? I guess I can use this instead
1 2 3 4 5 6
ifstream myFile(filename);
std::vector<std::string> words;
if (!myFile)
cout << "Could not open file" << endl;
while (myFile >> word)
words.push_back(word);
ALSO what can I use instead of "auto"?
Could you explain the while loops?
if (!myFile)
cout << "Couldn't open the file" << endl;
while (myFile >> word) //get words one by one (ignoring white spaces) and push them into the string vector
{
transform(word.begin(), word.end(), word.begin(), ::tolower);
words.push_back(word);
}
std::sort(words.begin(), words.end()); // sort the vector to make repetitions appear next to each other
it = words.begin();
while ( it != words.end() ) {
currentWord = it;
count = 0;
while (it == (it + 1)) {
count++;
it++;
}
count++;
it++;
printIf(*currentWord, count, threshold);
}
myFile.close();