Counting Sort

Mar 23, 2019 at 3:43pm

Hello guys!

I'm using a count sort algorithm to sort certain number in the range of 0 to 50. Then display the numbers + their frequency. The problem is they get sorted by index not by frequency. For better idea i will show the code below and the Input im inputing and the output i want to get.

  #include <stdio.h>
#include <stdlib.h>
#define MAXN 1000000
#define K 50
int a[MAXN],N,b[K+1];

bool mycmp(int i, int j)
{
    return(i>j);
}

int main()
{
    int i,j,l=0;
    scanf("%d", &N);
    for(i=0;i<=K;i++) b[i]=0;
    for(i=1;i<=N;i++)
    {
        scanf("%d", &j);

        b[j]++;
    }
    
    for(i=0;i<=K;i++){
        if(b[i]>0)
        {
            for(j=1; j<=b[i];j++)
            {
                a[l++]=i;
            }

        }
    }

    for(i=0;i<=K;i++)
    {
    
            if(b[i] > 0)
            {
                
                 printf("%d %d\n", i, b[i]);
            }
    }
    
}

Edit & run on cpp.sh

Input :

1
2
3

9

42 42 34 26 42 35 34 47 47

Output that i get :

But the output i want to get is :

I get the idea behind the count sorting, but for some reason i can't figure out how to display it based on the frequency(how to modify the print).

Last edited on Mar 23, 2019 at 3:44pm

Mar 23, 2019 at 4:09pm

MikeStgt (466)

I see no other way than after sort count in a first step to sort in a second one by count found in first step.

Mar 23, 2019 at 4:19pm

FreeSocks (35)

I'm not sure i follow exactly what you mean by sort in a second one.

Mar 23, 2019 at 4:41pm

dutch (2548)

He means you need to sort your count array by the counts. Actually, you will also have to keep track of the original indices. So something like this:

#include <stdio.h>
#include <stdlib.h>

#define MAXVALUE 50

typedef struct {
    int index;
    int count;
} Count;

int cmp_count(const void *a, const void *b) {
    return ((Count*)b)->count - ((Count*)a)->count;
}

int main() {
    Count cnt[MAXVALUE + 1];
    for (int i = 0; i <= MAXVALUE; ++i) {
        cnt[i].index = i;
        cnt[i].count = 0;
    }

    int n;
    scanf("%d", &n);

    for (int i = 0; i < n; ) {
        int x;
        scanf("%d", &x);
        if (x >= 0 && x <= MAXVALUE) {
            ++cnt[x].count;
            ++i;
        }
        else
            printf("Values must be from 0 to %d\n", MAXVALUE);
    }

    qsort(cnt, MAXVALUE + 1, sizeof *cnt, cmp_count);

    for (int i = 0; i <= MAXVALUE; i++)
        if (cnt[i].count > 0)
             printf("%2d %4d\n", cnt[i].index, cnt[i].count);

    return 0;
}

Edit & run on cpp.sh

Last edited on Mar 23, 2019 at 4:44pm

Mar 23, 2019 at 4:48pm

Duthomhas (13270)

You are not printing the sorted numbers. (You are printing stuff about your histogram.)

puts( "(sorted)" );
for (i = 0; i < l; i++)
  printf( "%d ", a[i] );
printf( "\n" );

It would help you significantly to use better variable names. Good variable names help you to understand an algorithm. Bad variable names confuse you.

This particular counting sort capitalizes on the equivalence == identity optimization. See more here:
http://www.cplusplus.com/faq/sequences/sequencing/sort-algorithms/counting-sort/

Hope this helps.

Mar 24, 2019 at 9:17am

lastchance (6980)

Are you sure that you want to use counting sort?

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>
#include <map>
#include <vector>
using namespace std;

using PR = pair<int,int>;

int main()
{
// ifstream in( "input.txt" );
   istringstream in( "42 42 34 26 42 35 34 47 47" );
   map<int,int> freq;
   for ( int i; in >> i; ) freq[i]++;
   vector<PR> rank( freq.begin(), freq.end() );
   sort( rank.begin(), rank.end(), []( PR a, PR b ){ return a.second > b.second; } );
   for ( PR p : rank ) cout << p.first << '\t' << p.second << '\n';
}

Edit & run on cpp.sh

Last edited on Mar 24, 2019 at 9:17am

Mar 24, 2019 at 10:18am

FreeSocks (35)

Thanks for your input guys.

@lastchance. Yes i'm trying to get the counting sorth algorithm as i have shown to work on this. Sadly the task i'm trying to follow requires me not to use vector or map.

I understand your answer is really logical, but i was really wondering if there is some way with the structure i gave as an example to make it work. Or it won't be really possible ?

I manage to use the STL sort algorithm to sort array b, that holds the frequencies. But i can't really find a way to say that that frequency matches that indice from array 'a'.

Last edited on Mar 24, 2019 at 10:21am

Mar 24, 2019 at 5:42pm

Duthomhas (13270)

Pay attention now.

First, stop using the word “frequency”. It is clouding your understanding.

    frequency = number of occurrences / total number of samples

Counting sort does not care about frequency. It works on individual number of occurrences directly.

Second, do yourself a favor and click the link I gave you, and take the time to at least watch the animation.
http://www.cplusplus.com/faq/sequences/sequencing/sort-algorithms/counting-sort/
Understanding how the sort works, from start to finish, makes all the difference. Trying to understand code you obtained from somewhere is almost impossible without significant experience — experience you do not have as a beginner.

To recap: Given your input set:
    42 42 34 26 42 35 34 47 47
We can put it in a histogram, which lists each pair as (element, number of times element appears)
    0   0     1   0     2   0     ...     26  1 ← nonzero     ...     34  2 ← nonzero     35  1 ← nonzero     ...     42  3 ← nonzero     ...     47  2 ← nonzero     ...     49  0     50  0
One of the important characteristics of the histogram is that the keys are sorted. This happens naturally when you use an array as the histogram, where the indices into the array are the values being sorted, and the elements of the array are the number of times the key appears in the input.

Notice also that the number of times that, say, 15 appeared in the input was zero. Hence, it will appear zero times in the sorted list.

Since the keys are sorted, we already have a sorted list. The only thing left to do is repeat each element the number of times it appeared in the input.
    26 34 34 35 42 42 42 47 47
You should see that there is a direct correlation between the histogram and the sorted array. The link I gave you takes special care to cover this property — it is the basic premise behind a counting sort, the reason that a counting sort works.

You must understand this in order to understand a counting sort.

So far, you have used the equivalence == identity property of integers to short-circuit your counting sort. If that is sufficient for your grade, then good enough. But you ought to be aware that it is only half of a proper counting sort. (Read the link!)

Good luck.

Mar 24, 2019 at 7:39pm

FreeSocks (35)

@Duthomhas

Hello! Thanks for both of your post.

I believe I am not phrasing myself right and it is causing some issues.

The reason i use frequency is because that is the only word i kno to explain myself.
I do understand that by the basic nature of the counting sort, it sorts the keys. I did enter your link, i did watch the animation and i have seen many other animations and i do understand the direct correlation. And i also see that we repeat each element the number of time it appears.

What I was trying to get is. If there is a way to modify the standart structure i have at the moment so that when i display it. it shows based on the count. Instead of just showing the sorted keys and their occurances directly.

Thank you!

Last edited on Mar 24, 2019 at 7:40pm

Mar 24, 2019 at 8:21pm

lastchance (6980)

No std::vector, no std::map. If you don't want std::sort then you will have to write your own.

#include <iostream>
#include <fstream>
#include <sstream>   // only to fake the input
#include <algorithm>
using namespace std;

const int MAX = 50;
using PR = pair<int,int>;

int main()
{
// ifstream in( "input.txt" );
   istringstream in( "42 42 34 26 42 35 34 47 47" );
   
   // Basic counting
   int freq[1+MAX] = { 0 };
   for ( int i; in >> i; ) freq[i]++;
   
   // Size of array required
   int size = 0;
   for ( int i = 0; i <= MAX; i++ ) if ( freq[i] ) size++;
   
   // Stick values and their frequencies (yes, frequencies!) in an array of pairs   
   PR *rank = new PR[size];
   for ( int i = 0, pos = 0; i <= MAX; i++ ) if ( freq[i] ) rank[pos++] = { i, freq[i] };   
  
   // Create ranking, sorted by VALUES, not KEYS   
   sort( rank, rank + size, []( PR a, PR b ){ return a.second > b.second; } );
   for ( int i = 0; i < size; i++ ) cout << rank[i].first << '\t' << rank[i].second << '\n';
}

Edit & run on cpp.sh

Last edited on Mar 24, 2019 at 8:32pm

Mar 24, 2019 at 8:32pm

FreeSocks (35)

@lastchance

Thanks a lot. Pretty much what i was aiming at.

Thank you !

Mar 24, 2019 at 8:38pm

dutch (2548)

Thanks a lot. Pretty much what i was aiming at.

But I already did that in my post, except that I did it in C since that's what you were using.

Mar 24, 2019 at 8:47pm

lastchance (6980)

dutch wrote:
But I already did that in my post

Ah, true.

Sorry, @dutch, I came straight from Fortran to C++ without this scanf/printf malarky. I'm pretty hopeless at pure C. But yes, @FreeSocks, on inspection that was the gist of @dutch's code, so you got an equivalent answer earlier from @dutch if that is what you want. I don't see any sensible way of avoiding the second qsort or sort once the initial counting sort is done.

I was just trying to rewrite my std::map/std::vector code (which I vastly prefer!) without actually using std::map/std::vector!

Last edited on Mar 24, 2019 at 8:59pm

Mar 24, 2019 at 10:14pm

Duthomhas (13270)

I believe I am not phrasing myself right and it is causing some issues.

What I was trying to get is. If there is a way to modify the standart structure i have at the moment so that when i display it. it shows based on the count.

@lastchance

Thanks a lot. Pretty much what i was aiming at.

Ah, so, it appears I have totally misread your intention.
You mention that you are using a counting sort.
You go through the trouble of actually performing a counting sort.

But you are only interested in the histogram.

Why are you bothering to use a counting sort then?

ANyway, glad you got it to do what you wanted...

Mar 24, 2019 at 10:28pm

FreeSocks (35)

It seems there are many confusions around ^^ And it's largely due to my lack of explanation. I'm sorry about that.

@dutch I'm not using C, its just how i learned to perform this type of counting sort and since one language is based on the other, it might look like it. But i'm in fact using it in C++. And i do use a sort to sort stuff out, but i was using it wrongly thats where @lastchance helped me realise it a bit better.

Another reason i might have misunderstood all examples is, because I'm following too strictly the way i learned the count sort which is with the structure i used in my first post. Which i assume can be modified in all the different ways you guys showed me, but instead going out of the box i stayed inside it and made a mess.

@Duthomhas I'm trying to solve a hackerank from a old university practice task(nothing current its an old one). And i decided that count sort will be the most logical one since we had to keep the count of each of the numbers and since the task required a algorithm(like counting sort, merge sort, bubble sort). Also my C or C++ is not that amazing(i admit im trying to learn it, i do far better in some more modern langauges which might be worse or better..). So thats why i use this structure and thats why i did it the way i did it. If there is a better algorithm im happy to learn it.

All in all. Thank you for your replies i will for sure review them indepth!

P.S: @lastchance i didn't want to use std::map or std::vector because i'm not sure the task wants you to do it, but rather use self-written algorithm like the counting sort instead of using a map or vector and do stuff with it. But i might be wrong.(its a pretty old task given in the university i am in currently and i was trying to solve it to practice algorithm, i should ask someone for clarification next time)

Last edited on Mar 24, 2019 at 10:37pm

Mar 25, 2019 at 4:15am

Duthomhas (13270)

Ah, well, I am more impressed than I was if you managed that sort by yourself.
Sorry for having been so inattentive and dismissive. I was out of line.

If you want to see something using modern C++, here you go. :O)

These two sorts rely on the magic key sorting properties of the std::map container. An array is nice, but is significantly limited in terms of input range. A map has no such limitations, and is O(log n) for operations... which means we don’t get the nice O(n) that a counting sort with an integer array histogram promises.

There are two forms:

  • equivalence != identity
     This is for elements that may compare as equal, but are not actually identical.
     For example, sorting people based on their last names.
     Counting sort is a stable sort.

  • equivalence == identity
     This is for elements that are actually identical when they compare as equal, like integers.
     The nice property of this is that we don’t need to keep the original values;
     We can just re-create them from the histogram.
     It also means we can sort the sequence in-place (not counting the histogram).

In short, in a counting sort the histogram does all the work of actually sorting. This is true of both a simple integer array histogram and a std::map histogram.

#include <iostream>
#include <iterator>
#include <map>
#include <numeric>
#include <type_traits>
#include <boost/range/adaptor/reversed.hpp>
using boost::adaptors::reverse;

//-------------------------
// Counting Sort
//-------------------------
// equivalence != identity
//
template <typename Container>
Container counting_sort_equivalence( const Container& xs )
{
  Container result( xs.size() );

  // Histogram the input sequence
  std::map <typename Container::value_type, std::size_t> histogram;
  for (auto x : xs) histogram[ x ] += 1;

  // Partial sum histogram → indices into the output sequence
  std::size_t sum = 0;
  for (auto& kv : histogram) sum = kv.second += sum;

  // Copy input values to their correct place in the output
  for (auto x : reverse( xs )) result[ --histogram[ x ] ] = x;

  return result;
}

//-------------------------
// Counting Sort
//-------------------------
// equivalence == identity
//
template <typename Container>
void counting_sort_identity( Container& xs )
{
  // Histogram the input sequence
  std::map <typename Container::value_type, std::size_t> histogram;
  for (auto x : xs) histogram[ x ] += 1;

  // Duplicate histogram keys into the output
  std::size_t n = 0;
  for (auto kv : histogram)
    while (kv.second--)
      xs[ n++ ] = kv.first;
}

//-------------------------
int main()
{
  std::vector <long> xs{
    std::istream_iterator <long> ( std::cin ), 
    std::istream_iterator <long> () 
  };

  auto ys = counting_sort_equivalence( xs );
  for (auto y : ys) std::cout << y << " "; std::cout << "\n";

  counting_sort_identity( xs );
  for (auto x : xs) std::cout << x << " "; std::cout << "\n";
}

Edit & run on cpp.sh

Hope you enjoy!

[edit] Fixed a typo.

Last edited on Mar 25, 2019 at 2:18pm

Mar 25, 2019 at 7:55am

FreeSocks (35)

@Duthomhas.

Thank you very much for your solution.

This solutions are all great and extensive.
I also spoke with the person who put out this task last year. And indeed what i was trying to do was wrong like you guys mentioned and i boxed myself really hard into doing it the wrong way. All the inputs given by you guys are really what i was supposed to do and implement, but my lack of attention caused me this mishap.

Thanks once again. Will re-read all the solutions to make sure i get the proper gist of it!

Topic archived. No new replies allowed.