Word Frequencies

Hello everyone. I'm working on a CS assignment and struggling to figure out just how to approach it.

The gist of the assignment is to enter an integer followed by a character array of less than 20 words (each with less than 10 letters). The program should then output that array of words followed by the number of times they appear in the cstring.

So if the input is:

5 hey hi Mark hi mark

Then, the output is:

hey 1
hi 2
Mark 1
hi 2
mark 1

Currently, I am feeling pretty lost. I've hunted around a number of places for a good way t get this done (including this forum) but I am still lacking a lot of intuition here. What I know is that this will require an array for the strings and an array for the frequencies, but I'm at a loss for how exactly to implement that, and I'm not even sure if character arrays are what I should be using instead of just regular strings.

My rudimentary code just outputs the input without the number on the front. It looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <cstring>
#include <string>
#define MAX_INPUTS 200
using namespace std;

int main() {

int outnum;

int i;

int j;

char words[20];

char input[MAX_INPUTS];

cin >> outnum;
cin.getline(input, MAX_INPUTS);

cout << input;
   return 0;
}


For reference, if fed the earlier input, this outputs " hey hi Mark hi mark".

I really feel like I'm lacking some critical knowledge here.
Break the problem down into smaller steps.

For example, the first step you could do would be to write the code to print this.


word count = 5
found word = hey
found word = hi
found word = Mark
found word = hi
found word = mark


When you can do that, then you can think about what to do next.

Are you allowed to use things like https://www.cplusplus.com/reference/map/ ?
The problem is close to trivial with something like this.
Are you allowed to use things like https://www.cplusplus.com/reference/map/ ?
The problem is close to trivial with something like this.


I don't know if we were allowed to use <map> or not, but given that your post is the first I've ever heard of it, I'd guess it hasn't been part of the curriculum so far, and going outside of that is at least discouraged.

I ended up asking a CS major buddy of mine, and we came up with something that looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <cstring>
#include <stdio.h>
#include <string.h>
using namespace std;

int main() {
 string line;
 getline(cin, line);

   int wordCount = 0;
   string words[21];
   string currentWord = "";
   for (int i = 0; i < line.length(); i++)
   {
      if (line[i] == ' ')
      {
        words[wordCount] = currentWord;
        wordCount++;
        currentWord = "";
      }
      else
      {
        currentWord += line[i];
      }
   }
   words[wordCount] = currentWord;

  wordCount++;

 for (int i = 1; i < wordCount; i++)
 {
   currentWord = words[i];
   int otherWordCount = 0;
   for (int j = 0; j < wordCount; j++){

    if (currentWord.compare(words[j]) == 0){
      otherWordCount++;

    }

   }
   cout << words[i] << " " << otherWordCount << endl;
 }
}


It works, but like I said, I still feel there's some critical understanding I'm just lacking here. I can't point to anything specific as to what that is, because I know what each of these things mean individually, but I just can't seem to see the forest for the trees.
Last edited on
How this is expected to be done, as against how it could be done, depends much upon what has already been taught. There are various ways to split a string into it's words and different techniques to produce the word count.

However, one way to do this type of problem is to accumulate the count as you extract each word. Consider without using any 'fancy' C++ classes/functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <iostream>
#include <string>

int main()
{
	const size_t maxwrds {20};
	std::string words[maxwrds] {};
	size_t count[maxwrds] {};
	size_t nowrds {};
	std::string wrd;
	size_t gotwrds {};

	std::cout << "Enter number of words: ";
	std::cin >> nowrds;

	std::cout << "Enter the words\n";

	for (size_t w {}; (w < nowrds) && (std::cin >> wrd); ++w) {
		bool got {};

		for (size_t w1 {}; w1 < w; ++w1)
			if (words[w1] == wrd) {
				++count[w1];
				got = true;
				break;
			}

		if (!got) {
			words[gotwrds] = wrd;
			count[gotwrds++] = 1;
		}
	}

	for (size_t w = 0; w < gotwrds; ++w)
		std::cout << words[w] << ' ' << count[w] << '\n';
}



Enter number of words: 5
Enter the words
hey hi Mark hi mark
hey 1
hi 2
Mark 1
mark 1



a little advance but leading you down the thought process a bit:

how would you do it if you were counting a million integers from 0 to 1000?
unsigned int x[1000]{0};
x[number_being_counted]++; //do you see how this might work?

once you understand the above:
what if you could find a way to convert a word into an integer, from 0 to 1000?
this is called a hash function (overly simple explaination: a hash function converts "data" (could be anything) into (usually) an integer value), and crafting one that works can be tricky but there are plenty of resources online for simple but effective ones.

once you understand *that*, consider whether c++ has anything like it you can use off the shelf (it does, see the post above on maps).

storing each one in a container with a count and then fishing it back out with the counted amount is the right way. The above refines this idea by optimizing the search so you don't spend all day iterating if you were processing something like the entire "mission earth" series.
Last edited on
Topic archived. No new replies allowed.