I hope this is the right place to post this.....
I am currently having to write a program that opens a file and figure out which word appears the most. It has to be able to deal with punctuation and numbers so that they don't effect the count. In other words I have to make sure "wow.", "wow!", and "wow?" are all counted as just "wow" and if it's something like "S20" I need to delete the numbers and keep the "S".
I think I can figure that part out, but the part I'm having trouble with is that if the string is something like "h3llo", I have to delete the 3, then keep the "h" and "llo" as two separate words.
Here is what I have so far...(I know I'm an amateur, please be kind)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
|
#include <iostream>
#include <string>
#include <cstring>
#include <fstream>
#include <cctype>
using namespace std;
const int MAX = 20;
string words[MAX];
int count[MAX];
int counter=0;
string Format(string entry)
{
int size = entry.length();
for(int chr = 0; chr < size; chr++)
{
if (isdigit(entry[chr]))
entry[chr] = ' ';
if (isupper(entry[chr]))
entry[chr] = tolower(entry[chr]);
if (ispunct(entry[chr]))
entry[chr] = ' ';
}
return entry;
}
void WordCounter(string word)
{
for (int i = 0; i < counter; i++)
if(word == words[i])
count[i]++;
words[counter] = word;
count[counter] = 1;
counter++;
}
int main(int argc, char* argv[])
{
ifstream Input("3.txt");
string word, word2;
if(!Input)
cout << "There is something wrong with the file!" << endl;
while (Input >> word)
{
word2 = Format(word);
WordCounter(word2);
}
for (int i = 0; i < counter; i++)
cout << words[i] << " " << count[i] << endl;
return 0;
}
|
This is the contents of the file that I'm currently using to test this, but I will need it to be able to process other files...
How are you?
Fine! Thanks. And you?
I'm fine, too.
|
I can't use any STL code. and so far all the output is just to show me what it's doing so far.
Here is the current output...
how 1
are 1
you 2
fine 2
thanks 1
and 1
you 1
i m 1
fine 1
too 1
|
EDIT: I forgot to mention that if there is more than one most common word I have to state both of them. So for this file I would need to state something like
you: 2
fine: 2