Please check my Mode Program

This program finds the mode of a given data set.

By definition, mode is "the value that appears most often in a set of data". There can be more than 1 mode. There can be no mode if no value repeats.

The program will receive data set from a text file named "testData.txt". Then, it will store the modes in a vector.

This program will eventually be a member function of a class in a larger project. This program was created for testing/debugging.

Here is a sample run 1:
Given the following Text File
1
2
3
3
4
4
5

The Output is
Mode: 3 4


Here is a sample run 2:
Given the following Text File
1
2
3
4
5

The Output is
No modes


I've already produced working codes for the program (Or at least, I think I did). However, I want some feedback to improve the codes.

This is the algorithm that I used:
1. Store all given values in a vector type <double>
2. Create another vector type <int> for storing counts of each value in the first vector.
3. Find the largest count in the second vector
4. Find all values that have the highest count and store each one in a final vector - Ensure that a value is stored in the vector only once.

Here are some questions.

What could be better algorithm to find the mode?
Could I have utilized <vector> and <algorithm> library more efficiently?
How is the readability of the codes?
Do comments help at all?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>

int main ()
{
    std::ifstream infile ( "testData.txt" );

    std::vector <double> numbers;

    numbers.reserve ( 1 );

    double num;

    while ( infile )
    {
        infile >> num;
        numbers.push_back ( num );
    }

    numbers.pop_back ();            // While Loop controlled by infile will add an extra index. Take it out

    int n = numbers.size ();        // Hold the value of sample size

    std::vector <int> occurrences;  // Hold counts of all values

    occurrences.reserve ( 1 );

    for ( int i = 0 ; i < n ; i ++ )
        occurrences.push_back ( std::count ( numbers.begin() , numbers.end() , numbers.at(i) ) );

    int maxOccurrence = *std::max_element ( occurrences.begin () , occurrences.end () );    // Find the largest count value

    std::vector <double> storeModes;

    storeModes.reserve ( 1 );

    if ( maxOccurrence == 1 )
    {
        std::cout << "No Modes" << std::endl;
        return 0;
    }
    else
        for ( int i = 0 ; i < n ; i ++ )
            if ( occurrences.at ( i ) == maxOccurrence )
                if ( *std::find ( storeModes.begin() , storeModes.end() , numbers.at(i) ) != numbers.at(i) )
                    storeModes.push_back ( numbers.at ( i ) );

    std::cout << "Mode:  ";
    for ( int i = 0 ; i < storeModes.size () ; i ++ )
        std::cout << storeModes.at ( i ) << " ";

    return 0;
}


To test, copy the codes in a .cpp file, then create a txt document and name it "testData" and save some values. The txt document should be in the same directory as the executable file once the .cpp file is compiled.

Thank you all in advance!
Last edited on
1. Store all given values in a vector type <double>
2. Create another vector type <int> for storing counts of each value in the first vector.
If the data is a set of doubles between 0 and 1, how will you represent the histogram in the vector?
This is how it would work for data set of doubles between 0 and 1

Let's say that we have the following data set:

0.1 0.2 0.3 0.3 0.4 0.4 0.4 0.4 0.4 0.6 0.9


Then, the first double vector,
 
std::vector <double> numbers;

would store
0.1 0.2 0.3 0.3 0.4 0.4 0.4 0.4 0.4 0.6 0.9


The second int vector,
 
std::vector <int> occurrences;

would store the counts of each value in the first vector (In the corresponding index)
1 1 2 2 5 5 5 5 5 1 1


Then, find the largest count in the occurrences and store it in
 
int maxOccurrence;

So, in this case, maxOccurrence would be 5

Then, finally, the third vector,
 
vector <double> storeModes;

would store all values from the first vector that have a count of 5 in the corresponding index of the second vector
0.4 0.4 0.4 0.4 0.4


Of course, I would want to store 0.4 only once, so I included the following control at Line 47
 
if ( *std::find ( storeModes.begin() , storeModes.end() , numbers.at(i) ) != numbers.at(i) )

Basically, before a value is added to the storeModes vector, the program will check the elements in the storeModes for the same value. If the same value is found, then it won't add the same value again

So, finally, the storeModes vector would have the following
0.4

, which is the mode for this example data set.


I hope that answers your question.

I am sure there is a better way to program this. I am not so experienced with C++ or problem-solving. I came up with this by myself and I am looking for ways to improve the codes.
Last edited on
Topic archived. No new replies allowed.