Trying To Make A Program That Returns The Number Of Instances Of Each Unique Word

Howdy guys,

I'm trying to make a program that takes, as input, text. This program
then finds each unique word (excluding capitalization and punctuation)
and returns the number of occurrences of each word.

I was messing around with pointers and malloc while coding this, and think
I might have gotten in a little over my head :-D.

This is the code I have so far, but whenever I run it, even with one word,
I get a segmentation fault. Also, I'm a tad hazy on how to create and pointer
to an array of structs, and how to pass that array to a function.

Any help anyone can offer as far as why I am getting a segmentation fault,
as well as how to create a pointer to an array of structs, would be appreciated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
/***************************************************************************************
*
*Wordcounter - Receives text from standard input and counts the number of occurences
*of each unique word (capitalization and punctuation excluded). Then returns a list of
*unique words and the number of times each occured.  Not finished.
*
*
***************************************************************************************/



#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <cctype>

using namespace std;

struct singleWord
{
    char *w;
    int count;
};

char * getWord(void);
void analyzeWord(char *, singleWord *, int);
void checkMem(singleWord *, int);

int BUFFER_SIZE = 10;
int CHUNK_SIZE = 5;

int main()
{

    singleWord *listOWords = (singleWord *) malloc(BUFFER_SIZE * sizeof(singleWord));
    int size = 0;
    if (listOWords = NULL)
    {
        cout << "Memory exceeded.";
        exit(1);
    }
    analyzeWord(getWord(), listOWords, size);
    
    

}

/*Takes input until a space or newline char is entered, at which point it converts
  the input into a char * and then returns it */

char * getWord()

{
    int chunksize = 4, buffersize = 5;
    char *word;
    word = (char *)malloc(buffersize * sizeof (char));
    if(word == NULL)
    {
        cout << "Memory exceeded";
        exit(1);
    }
    int count = 0;
    bool keepGoing = true;
    do
    {
        char input;
        cin.get(input);
        if(input != ' ' && input != '\n')
        {
            tolower(input);
            if(strlen(word) == buffersize)
            {
                buffersize += chunksize;
                char *temp = (char *)realloc(word, buffersize);
                if(temp == NULL)
                {
                    cout << "Cannot allocate more memory";
                    exit(1);
                }
                else word = temp;
            }
            word[count] = input;
            count++;
        }
        else keepGoing = false; 
    }
    while(keepGoing);
    return word;
}

/*Takes a word, and array of singleWord structs, and the size of said array as
  input.  If the word is not contained in the singleWord array, creates a new entry.
  If the word is contained, increments singleWord.count for the appropriate entry. */

void analyzeWord(char *word, singleWord *listOWords, int size)
{
    bool wordExists = false;
    for(int i =0; i < size; i++)
    {
        if(strcmp(word, listOWords[i].w))
        {
            listOWords[i].count++;
            wordExists = true;
            
        }
    }
    if(!wordExists)
    {
        checkMem(listOWords, size);
        strcpy(word, listOWords[size].w);
    }
}

/*Checks to make sure there is enough memory to store another word in the array
  of singleWords.  If there is insufficient memory, reallocates memory */

void checkMem(singleWord *listOWords, int size)
{
    if(size = (BUFFER_SIZE - 2))
    {
        singleWord *temp = (singleWord *) realloc(listOWords, (CHUNK_SIZE * sizeof(singleWord)));
        BUFFER_SIZE += CHUNK_SIZE;
        if(temp == NULL)
            {
                cout << "Insufficient Memory";
                exit(1);
            }
        else listOWords = temp;
    }
}


If you read all that, thanks for taking the time.
You have a few functions in your code. Most primitive and tested way is upon each function entry and exit your print the arguments and return values. Narrow down the segmentation fault is occurring at which function and start from there to troubleshoot.
I see you programm in C++, so why don't you use
1
2
std::string s;
cin >> s;

to retrieve a word instead of all this messy memory manipulation.
Are you forbidden to use string?
For C++ this...
1
2
3
4
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <cctype> 

...should be this:
1
2
3
4
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <cctype> 

Also, if you are allowed for the assignment, you should avoid char* and use std::string.

EDIT:

Also don't use malloc in C++, use new[].

So this...
1
2
    char *word;
    word = (char *)malloc(buffersize * sizeof (char));

...should be this:
 
    char *word = new char[buffersize];

And when you're finished with it:
 
    delete[] word;

Last edited on
ok .. to start with

if (listOWords = NULL) is a assignment operator.
if(strlen(word) == buffersize) give the wrong length.
word is empty so cannot calculate the length.
instead of int chunksize = 4, buffersize = 5; use #define .
word = temp; temp empty ..
please correct this. This is for one function.
Last edited on

ok .. to start with

if (listOWords = NULL) is a assignment operator.
if(strlen(word) == buffersize) give the wrong length.
word is empty so cannot calculate the length.
instead of int chunksize = 4, buffersize = 5; use #define .
word = temp; temp empty ..
please correct this. This is for one function.


Lol. Whoops, thanks.


I've been using char * instead of string because we never covered the
c-strings in the class I took last semester and I'd like to learn how to
do those. Same with malloc and realloc, we never covered those.

So I guess this is mostly an exercise in programming in C, but I made it
a C++ file because I like to use cout and cin.

So I modified my code to just use an array of singleWord structs that initializes
to 100, so I don't have to worry about increasing or decreasing the size of the
array.

Then I noticed I had a problem when I ran a loop of getWord and analyzeWord,
and got a segmentation fault. I realized I never allocated memory for instances of
words in my array of structures.

The program now runs how I would like it to, but I am confused about some of the
lower-level stuff that goes on when using char * and an array of structs which contain
a char *

So I changed my code to this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
/***************************************************************************************
*
*Wordcounter - Receives text from standard input and counts the number of occurences
*of each unique word (capitalization and punctuation excluded). Then returns a list of
*unique words and the number of times each occured.  Not finished.
*
*
***************************************************************************************/

/*TODO
--Implement a function that displays the words and their respective counts
--Implement a function that removes leading and ending punctuation from words
--Implement a function that sorts the list of words and their counts in alphabetical
  order
  */


#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <cctype>

using namespace std;

struct singleWord
{
    char *w;
    int count;
};

char * getWord(void);
int analyzeWord(char *, singleWord *, int);
void loop(singleWord *, int &);

int BUFFER_SIZE = 10;
int CHUNK_SIZE = 5;

int main()
{
    singleWord list[100];
    int size = 0;
    if(list == NULL)
    exit(1);
    loop(list, size);
    cout << size << endl;
    for(int i = 0; i < size; i++)
    {
    cout << list[i].w << ": " << list[i].count << endl;
    }
}

/*Takes input until a space or newline char is entered, at which point it converts
  the input into a char * and then returns it */

char * getWord()

{
    int chunksize = 4, buffersize = 5;
    char *word;
    word = (char *)malloc(buffersize * sizeof (char));
    if(word == NULL)
    {
        cout << "Memory exceeded";
        exit(1);
    }
    int count = 0;
    bool keepGoing = true;
    do
    {
        char input;
        cin.get(input);
        if(input != ' ' && input != '\n')
        {
            tolower(input);
            if(strlen(word) == buffersize)
            {
                buffersize += chunksize;
                char *temp = (char *)realloc(word, buffersize);
                if(temp == NULL)
                {
                    cout << "Cannot allocate more memory";
                    exit(1);
                }
                else word = temp;
            }
            word[count] = input;
            count++;

        }
        else 
        {
        keepGoing = false;
        }
    }
    while(keepGoing);
    return word;
}

/*Takes a word, and array of singleWord structs, and the size of said array as
  input.  If the word is not contained in the singleWord array, creates a new entry.
  If the word is contained, increments singleWord.count for the appropriate entry. */

int analyzeWord(char *word, singleWord listOWords[], int size)
{
    bool wordExists = false;
    for(int i = 0; i < size; i++)
    {
        if(strcmp(word, listOWords[i].w) == 0)
        {
            listOWords[i].count++;
            wordExists = true;
            return 0;
            
        }
    }
    if(!wordExists)
    {
        cout << "woot" << endl;
        listOWords[size].w = (char *) malloc(sizeof(char) * strlen(word));
        strcpy(listOWords[size].w, word);
        listOWords[size].count = 1;
        return 1;
    }
}
/*Takes a word, and array of singleWord structs, and the size of said array as
  input.  If the word is not contained in the singleWord array, creates a new entry.
  If the word is contained, increments singleWord.count for the appropriate entry. */

int analyzeWord(char *word, singleWord listOWords[], int size)
{
    bool wordExists = false;
    for(int i = 0; i < size; i++)
    {
        if(strcmp(word, listOWords[i].w) == 0)
        {
            listOWords[i].count++;
            wordExists = true;
            return 0;
            
        }
    }
    if(!wordExists)
    {
        cout << "woot" << endl;
        listOWords[size].w = (char *) malloc(sizeof(char) * strlen(word));
        strcpy(listOWords[size].w, word);
        listOWords[size].count = 1;
        return 1;
    }
}

/* Runs the function getWord and analyzeWord until the user inputs "@@@"
   Used to get as many words from the user as he chooses to input. */

void loop(singleWord listOWords[], int &size)
{
    char *tmp = getWord();
    if(strcmp(tmp, "@@@") == 0)
    return;
    else
    {
    if(analyzeWord(tmp, listOWords, size) == 1)
    size++;
    loop(listOWords, size);
    }
}


My question now mostly revolves around line 119, which is in function analyzeWord.

--What happens when I initialize an array of a struct that contains a char *

--Why did I have to declare malloc instead of realloc on line 119?
Last edited on
Topic archived. No new replies allowed.