I'm a little confused with an assignment that i've been given. I am wanting to show the count of each individual word in the text1.txt file.
The banned.txt file gives me an array of banned words, the idea is to essentially compare the two files, and find the words in the text1.txt file.
I am also required to count how many times each individual word from the array is in the text1.txt file but I cannot figure it out. Any help is greatly appreciated.
#include <iostream>
#include <string>
#include <fstream>
usingnamespace std;
int main()
{
//Step 1 read in
ifstream infile("banned.txt"); //The list of search words is in "search1.txt".
string words[8]; //array to store the list of search words (words we need to find)
int wordCount = 0;
if (!infile) //checks we can open the file - validation example
{
cout << "ERROR: ";
cout << "Can't open search1.txt\n";
}
//before we close this file we want to read the contents into an array, we can do this with a for loop
for (int i = 0; i < 8; ++i)
{
infile >> words[i]; //by using the operator >> it reads from the variable "infile" and stores it into the array called words.
}
//to check we have stored the words we are searching for correctly inside the array, we can then use a cout to output them
for (int i = 0; i < 8; ++i)
{
cout << words[i] << " "; //output each member of the array for debugging purposes.
}
cout << endl;
infile.close(); //we have now finished with this file and all the information from the file has been stored into the array called words
//the next step is to read in the line of characters in that we will use to search for specific words that are stored inside the array called words
string text; //here we have declared a string called text which we can store our line of text in.
infile.open("text1.txt");
getline(infile, text);
while (!infile.eof())
{
cout << text;
cout << endl;
getline(infile, text);
}
infile.close();
//Step 2
//Do a comparison between the words from "search1.txt" and the words from "text1.txt".
//You need to find the occurrence of each word from “search1.txt” in the string of characters from “text1.txt”.
//Output the word, whether it has been found and, if found, the index of its location in the array,
for (int i = 0; i < 8; ++i)
{
int position = text.find(words[i]); //uses the inbuilt function called find()
cout << "\"" << words[i] << "\", ";
if (position != string::npos)
{
cout << "Found, location " << position << endl;
}
else
{
cout << "Not Found" << endl;
}
}
system("pause");
}
Each "banned word" has two attributes: the string and a count.
* The string is the word itself
* The count is how many times that word has been found from text.
You could have two arrays:
1 2
string words[8];
int counts[8] {}; // all initialized to be zero
When you find word words[k], you can increment counts[k] by one.
This is where struct is handy; only one array is required:
1 2 3 4 5 6
struct Word {
string w;
int count {}; // initialized to 0
};
Word words[8];
When you find word words[k].w, you can increment words[k].count by one.
Suppose the word "you" is banned. Now consider this text in the input file:
"Where are you?" said the man.
"You who?" I replied.
"You've got to be kidding! You know who I am!" he said.
As written, your code won't find a single occurrence. To fix this, you can normalize the words. That means you convert all forms of the word to the same string. For example:
- Remove all leading and trailing punctuation.
- Convert letters to upper (or lower) case.
This still won't catch "you've", but it's a pretty good start. To make it work with your code, add this:
// Normalize a word: convert upper case to lower. Also trim leading and trailing spaces.
string normalize(string word)
{
// trim punctuation from end
while (word.size() && !isalpha(word.back())) {
word.pop_back();
}
// trim leading punctuation
size_t pos = 0;
while (pos < word.size() && !isalpha(word[pos])) {
++pos;
}
if (pos) {
word.erase(0, pos);
}
for (auto &ch : word) {
if (isupper(ch)) ch = tolower(ch);
}
return word;
}
And in readFile(), change result.push_back(s);
to result.push_back(normalize(s));
This is all really helpful and is much appreciated!
Next I want to filter the text1.txt by comparing every word with the list that is in banned.txt. Right now it compares just the word that is in banned.txt and text1.txt ie: dog is in banned.txt so it will count dog in text1.txt. However, I want it to count words such as doggerel also, but replace the banned word with asterisks, and output them to a filtered file of text1.txt using the ofstream.
I can't seem to find any resources online that are explaining really what i'm trying to do. Any help? Thanks.
FOR each word in text1
IF is_bad( word )
THEN write * out
ELSE write word out
The function is_bad() returns true, if word should be banned. Else it returns false.
A variant:
FOR each word in text1
write censor(word) out
The function cencor() returns *, if word should be banned. Else it returns the word.
The point is that main program does something for the whole text and it uses
functions to do the work. A function like is_bad() or censor() does something with just one word.
Within those functions you can choose a comparison method that you like.
1 2 3
word == ban // case and punctuation sensitive, whole word match
normalize(word) == ban // whole word match
normalize(word).find( ban ) // substring match
seeplus uses different approach above. All text1 is in memory and is modified before the entire text is written out.