How to create a word and sentence counter

I am trying to write a program to determine the number of words, number of sentences, as well as the average words per sentence. I'm really unsure of where to start. Any ideas or suggestions on how to start would be very much appreciated.

Thank you,
Noah



First, how would you do this using pen and paper? What constitutes a 'word', what a 'sentence'? What instructions would you give some else who doesn't know anything about words/sentences to do this? When you know this, then you can produce a program design and then code the program from the design. First design, then code.
I think I have to find the number of white spaces and add one in order to find the number of words, but I'm also required for this assignment to use a sentinel loop.
For words you can probably count how many strings your stream will allow you to extract.

Sentences? Another matter entirely ...

How many sentences have I written here ... and how would you know?
To count the number of sentences I would have to figure out the number of periods, question marks, or exclamation points. Below is my code so far, I've attempted to create a program that counts the number of spaces.

int main(int argc, char const *argv[]) {
string sentence;
char wordnum = ' ';
int count;
cout << "Enter a paragraph: ";
cin >> sentence;
getline(cin,sentence);
for(int i = 0; i < sentence.length(); i++){
if (sentence[i]== wordnum){
count++;
}
}
cout << count;
}
Last edited on
I have a completed working solution. I guess i'll hold off on posting it but i will say that good organization is key and counting spaces is probably a waste of time. You can use a delimiter for getline that will break a sentence up into words in a loop. I didn't account for every type of punctuation... just bc that's no fun and seems like a pita.
Last edited on
I've changed my code and I think I'm on the right track but when I run this it doesn't seem to work as intended. When I run it and input my paragraph it returns 0 for both word count and sentence count.

int main(int argc, char const *argv[]) {
char paragraph[100];
int wordcount = 0;
int sentences = 0;

cout << "Enter a paragraph followed by three '@@@'";
cin >> paragraph;

for(int i = 0; i <= 100; i++)
{
if(paragraph[i] ==' '){
wordcount = wordcount + 1;
}
else if(paragraph[i] =='.' ){
sentences = sentences + 1;
}
else if (paragraph[i] =='!'){
sentences = sentences + 1;
}
else if (paragraph[i] == '?'){
sentences = sentences + 1;
}
{
break;
}

}
cout << wordcount;
cout << sentences;
}
Last edited on
this will not do what you want.

-- cin skips whitespace (spaces, end of lines, ..). if you want spaces, you need to use getline or another approach. (see next bullet, you can skip it and still count)
-- cin only gets the first token used this way. so you got the first word only, of whatever you typed into it.
-- cin does not clear c-strings. whatever junk is in paragraph is still there, so if last time you had 100 words with spaces, and this time you had 20 words with spaces, it sees 100 words. You MUST use strlen or the zero character (you can do it with break like your ? or put in loop condition) to stop the inner loop (the one checking the c-string). Are you allowed to use c++ strings or forced to use C style?
-- == 100 is out of bounds. so the loop should be < 100, not <= 100.
-- you can combine if conditions with boolean logic. if( paragraph[i] == '?' || paragraph[i] == '.' ) for example checks two characters.
-- x+=1; //add one to x. same as x= x+1; but less wordy.
-- you don't use the @@@ you asked for.
-- "is word count off by one?" contains 6 words and 5 spaces.

so, what to do?
-- you need two loops. one to read the data, and one to process it. the processing loop is OK (it can be improved, but its ok) apart from the out of bounds <= problem.

so why does it give zero, though?
input "blah blah blah."
the code you have gets "blah" ... no space, no period. no counts change.
Last edited on
Read words one at a time:
1
2
3
4
string word;
while (cin >> word) {
    ...
}

If a word ends with '.', '!' or '?' then it ends a sentence. A handy way to check if a character is in a set is via the strchr function (http://www.cplusplus.com/reference/cstring/strchr/):
1
2
3
if (strchr(".!?", ch)) {
   // ch is one of the characters
}


Using these methods, the whole program is less than 30 lines of code.
im just thinking that for a paragraph that contained "Hello. How are you?" there's only 3 spaces but 4 words. There could also be scenarios where people add in extra spaces for whatever reason. I just think theres better ways to go about it. I just added an empty space in my program and it counted it as a word but the way mine is setup it would be rather effortless to check to see if a word was just an empty string. The way you're going you'd have to check to see if you have extra spaces in a row or other shenanigans like that. My solution involves breaking the data down into smaller and smaller containers contained within one big container...Another flaw i can see with your program is putting a 100 char cap. This paragraph is about 750...
as it stands, if he does it the way I said, the 100 limit is per word, not per total. He was headed in that direction. (I know YOU know this Markyrocks, its to clarify for the OP). If you read the whole paragraph at once, eg via a getline, 100 is too small.
Last edited on
Counting spaces is flawed - a lot of people routinely double up the spaces for a new sentence.

You can either let the stream extractor do the work for you, as in @dhayden's
while (cin >> word)
or, if you want to parse one character at a time then you can count a new word whenever you change from white space (including newline) to non-whitespace, presuming you to be in whitespace when you start parsing.

Counting sentences is horrendously difficult to define, and counting occurrences of one of { . ? ! } doesn't work if you are allowed to use quoted text like:
"Are you a good C++ programmer?", he asked.

or you have people's names:
W.S. Churchill

or numbers:
3.14159

or, like me, you use an ellipsis (...) rather a lot. (I tried Microsoft Word on my previous post and it definitely counted ... as a single word.)
Last edited on
While Lastchance brings up excellent points, I think we can keep it simple here since this is the beginners forum. Noahkh11, I think you can assume that that if a word ends in one of the three characters, it's the end of a sentence. This will overcount for sime cases, but it's probably close enough. Getting more accurate could be extremely difficult.
I figured i gave it long enough i'll post what i had come up with. Maybe it will give the op some ideas. I tried to make the process as automatic as possible. It doesn't take into account the special cases that Lastchance brings up. I agree that those special cases are probably beyond the scope of the exercise.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

struct sentence {
	vector<string> words;
	int wordcount;
	string sent;

	sentence(string s)
		:sent(s) {
		this->separate();
		this->count();
	}

	void separate() {
		stringstream ss;
		ss << sent;
		while (1) {
			string str;
			if (getline(ss, str, ' ')) {
				if (str.size()) { words.push_back(str); }
			}
			else { break; }
		}
	}

	void count() { wordcount = words.size(); }
};

float totalwords(vector<sentence> s) {
	float a{};
	for (auto i : s) {
		a += i.wordcount;
	}
	return a;
}

float totalsentences(vector<sentence> s) { return s.size();}

float avg_wps(vector<sentence> s) { return (totalwords(s) / totalsentences(s)); }

int main() {
	stringstream ss;
	
	 string paragraph="I am trying to write a program to determine! the number of words, number of sentences? as well as the average words per sentence. Im really unsure of; where to start. Any ideas or suggestions on how to start would be very much appreciated.Thank you, Noah." ;
	 ss << paragraph;
	vector<sentence> vs;

	for (auto i : paragraph) {
		
		if (i=='.' || i == '?' || i == '!' || i == ';') {
			string str;
			if (getline(ss, str, i) && str.size()) {
				vs.push_back(sentence(str));
			}
			
		}
	}
		
	cout << "total sentences= " << totalsentences(vs) << " total words= " << totalwords(vs) << " avg wps= " << avg_wps(vs);
}
Going with what you have, except using c++ strings, and keeping it schoolyard simple, I get this: (yes, there are plenty of ways to confuse it, but it works on simple text, it works if multi spaces, requires using the @ thingy...)

you can swap strlen(s) for s.length() for C style, and its more or less the same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

   int main()
   {
     string s;
	 int words = 0;
	 int sents = 0;
	 while(s[s.length() -1] != '@')
	 {
	   cin >> s;	 
	   words++;
	   if (s[s.length() -1] == '.' || s[s.length() -1] == '?' || s[s.length() -1] == '!')
		   sents++;
	 }
	 cout << words << " words\n" << sents << " sentences\n";
   }
Last edited on
What actually constitutes a word and a sentence in the English language is quite complicated to describe accurately - and can vary between different 'interpretations'.

A simplistic definition is that a word is a sequence of characters not including white-space (and end-of-input) and that a sentence terminates when a word is immediately followed by either of .!? or end-of-input. Although this will report falsely for input such as A sentence . Another sentence ?? " by gosh ", she said ! which probably should report 8 words and 3 sentences.

As this is a beginner exercise, this is probably what is meant here - although the spec should state what is meant by 'word' and 'sentence'. and is probably as-expected and 'good enough'.

So consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <string>
#include <sstream>

int main()
{
	const std::string sentdel {".!?"};	//  Sentence delimiters

	size_t noword {0};
	size_t nosent {0};

	std::istringstream text {"this is a sentence!! And so is this!? and another one"};

	bool endsent {false};

	for (std::string word; text >> word; ++noword) {
		endsent = false;

		if (sentdel.find(word.back()) != std::string::npos) {
			++nosent;
			endsent = true;
		}
	}

	if (endsent == false)
		++nosent;

	std::cout << "No words: " << noword << '\n';
	std::cout << "No sent : " << nosent << '\n';
	std::cout << "Average word/sentence: " << (double)noword / nosent << '\n';
}


which displays:


No words: 11
No sent : 3
Average word/sentence: 3.66667

Last edited on
It won't cope with text-speak, but how about:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <cctype>
using namespace std;

int main()
{
   const string endSentence = ".?!";
   istringstream in( "Now. What have we here?\nA C++ forum!" );
   int words = 0, sentences = 0;
   bool space = true;
   for ( char c; in.get( c ); )
   {
      if ( space && !isspace( c ) ) words++;
      space = isspace( c );

      if ( endSentence.find( c ) != string::npos ) sentences++;
   }

   cout << "Words: " << words << '\n'
        << "Sentences: " << sentences << '\n'
        << "Average words per sentence: " << ( words + 0.0 ) / sentences << '\n';
}

Ignores final sentence for the sentence count if it doesn't end with one of the ,?! If you remove the ! from forum it reports 8 words but only 2 sentences. IMO this should be 3.
Topic archived. No new replies allowed.