[code review] mode of integers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Finds the mode of input integers

#include <iostream>
#include <vector>
#include <algorithm>

int main() {

    std::vector<int> data_elements;
    
    int data_element;
    while (std::cin >> data_element) {
        data_elements.push_back(data_element);
    }

    std::sort(data_elements.begin(), data_elements.end());

    int mode = { data_elements[0] };
    int mode_frequency = { 1 };
    int element_frequency = { 0 };
    int prev_element = { data_elements[0] };

    for (int data_element : data_elements) {
        if (data_element == prev_element) {
            ++element_frequency;
            if (element_frequency > mode_frequency) {
                mode = data_element;
                mode_frequency = element_frequency;
            }
        }
        else {
            element_frequency = { 1 };
        }
        prev_element = data_element;
    }

    std::cout << mode << " is the mode with frequency: " 
        << mode_frequency;

    return 0;
}


I appreciate suggestions. Is there perhaps a different way to find the mode?
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>
#include <vector>
#include <map>
using namespace std;

int main()
{
   vector<int> test = { 1, 2, 4, 5, 4, 2, 3 };

   map<int,int> freq;
   for ( auto e : test ) freq[e]++;
   
   int freqMax = -1;
   for ( auto pr : freq ) if ( pr.second > freqMax ) freqMax = pr.second;
   
   cout << "The modal frequency is " << freqMax << " for value(s) ";
   for ( auto pr : freq ) if ( pr.second == freqMax ) cout << pr.first << ' ';
}


The modal frequency is 2 for value(s) 2 4  
Last edited on
@lastchance, that's awesome!

Do you generally use auto for all ranged for loops?
Code review: What happens if data_elements.size() stays as 0 past line 12?

I think something like finding the mode of a data set is generally O(n log n). You might be able to get that closer to linear time in practice by using a hash set (unordered map), which has O(1) complexity on average (unless it needs to do a rehash).

But a sort + linear traversal may be the fastest method for small-to-medium-sized data sets because of cache locality, which you will not have with a std::map or std::unordered_map. If you're worried about performance, then be sure to measure the difference between implementations on the kind of data you expect to process; everything else is just theory.
Last edited on
@Ganado

Code review: What happens if data_elements.size() stays as 0 past line 12?

An error from trying to access data_elements[0]. Thanks!

I think something like finding the mode of a data set is generally O(n log n). You might be able to get that closer to linear time in practice by using a hash set (unordered map), which has O(1) complexity on average (unless it needs to do a rehash).

But a sort + linear traversal may be the fastest method for small-to-medium-sized data sets because of cache locality, which you will not have with a std::map or std::unordered_map. If you're worried about performance, then be sure to measure the difference between implementations on the kind of data you expect to process; everything else is just theory.

Thanks for the comprehensive insights! I definitely need to learn more about hash sets!
Do you generally use auto for all ranged for loops?


Actually, no. If I'm sure of the type and it's simple then I prefer to write it explicitly.

However, in this instance I was hedging my bets in case you decided to change the type of the data! It's also useful in template fuinctions.
@lastchance

Thoughtful! Thanks.
Topic archived. No new replies allowed.