O(n) sort function

I've recently come up with a linear time sorting algorithm. It is rather restricted in what it can sort, as it only works with integer types, but it ran faster than the quicksort function provided in the standard library on array sizes over 20,000,000 on my computer. It uses a secondary array to create a distribution table of the values in the array to be sorted then steps through the second array and adds the necessary number of elements to the first one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define LEN 40000000 //length of array to fill
#define MAXVALUE 400000 //maximum random value to be assigned to an array element


/*
The following function implements a sort function that works in O(n) time in best case 
(all values in the array to be sorted are unique or all the same value) by creating 
a distribution table in a secondary array and then walking through the distribution 
table and filling up the original array from the distribution table.
*/
void linsort (int*A,int len)
{
    int large, small, i, j, index;
    int * B;
    //set large and small equal to the first element in A
    large=A[0];
    small=A[0];
    for (i=1;i<len;i++) //identify the range of values in A
    {
        if (A[i]>large)
            large=A[i];
        else if (A[i]<small)//don't need a separate if statement due to the fact that
            small = A[i]; //a number won't be larger and smaller than large and small respectively
    }
    
    B=(int*)calloc(large-small+1,sizeof(int));//allocate a new array B with a number 
    //of elements equal to the range of values in A off of the heap
    
    for (i=0;i<len;i++)
        B[A[i]-small]++;//make a distribution table for A in array B
        
    for (i=0,index=0;i<large-small+1;i++)//step through the distribution table array
        for (j=0;j<B[i];j++)//add to A a number of elements of value i equal to the value in B[i]
            A[index++]=i+small;//assign i to A[index] and increment index
    
    free (B);//deallocate the memory used for the distribution table array
}

/* 
This function checks if an array of integers is sorted in non decreasing order
and returns 1 if it is sorted properly, and 0 of it is not
*/
int checksort (int*A,int len)
{
    int i;
    for (i=0;i<len-1;i++)
        if (A[i]>A[i+1])
            return 0; //array isn't sorted in non decreasing order
    return 1;
}    

/*
quicksort comparator, for relative speed testing
*/
int compare (const void * a, const void * b)
{
  return ( *(int*)a - *(int*)b );
}


int main ()
{
    int *A, *B;
    int i;
    time_t stime, etime;//two variables to hold start and end times of the sort
    double diff1, diff2;//holds time taken by the sort in seconds
    
    srand(time(NULL));//seed random number generator
    
    A=(int*)malloc(sizeof(int)*LEN);//allocate an array off the heap
    B=(int*)malloc(sizeof(int)*LEN);//allocate an array off the heap  
    
    printf("Generating array 1\n");            
    for (i=0;i<LEN;i++)
        A[i]=rand()% MAXVALUE+1; //randomly generate a value between 0 and MAXVALUE
    
    printf("Generating array 2\n");
    for (i=0;i<LEN;i++)
        B[i]=rand()% MAXVALUE+1; //randomly generate a value between 0 and MAXVALUE
    
    printf("Running sorts\n");
    
    time(&stime);//get time before starting to sort
    linsort(A,LEN);//sort the array with linsort function
    time(&etime);//get time after sorting
    diff1=difftime(etime,stime); //compute difference of two times in seconds
    
    time(&stime);//get time before starting to sort
    qsort(A,LEN,sizeof(int),compare);//standard library quicksort function
    time(&etime);//get time after sorting
    diff2=difftime(etime,stime); //compute difference of two times in seconds
    
    
    /*for (i=0;i<LEN;i++) //uncomment to output the array to standard output
        printf("%i ",A[i]);*/
    //printf("Result of checksort: %i\n",checksort(A,LEN));
    printf("Time taken by linsort:  %lf\n",diff1);
    printf("Time taken by quicksort:  %lf\n",diff2);
    
    //deallocate memory for arrays
    free (A);
    free (B);
    
    fflush(stdin);//flush input buffer
    printf("Press enter to continue");
    getc(stdin);//wait until user enters something before exiting
}


I've got a few questions however, as I've only recently started programming in C (4-5 months).

First of all, has this been done before? Is it already out there but I haven't found it yet?
Second, are my uses of malloc and calloc correct? Are there any memory leaks I've missed?
Third, would quicksort run faster with a different comparator, and if so, could I have an example?
Fourth, how is my code readability? Am I commenting enough, am I indenting properly, etc? Are there any standard practices for formatting I'm unaware of?
http://en.wikipedia.org/wiki/Counting_sort

Technically, it's O(n+k).

About your code, I'd say you're commenting way too much. I mean, you don't need to explain why you use else if instead of if.
Another example:
assign i to A[index] and increment index

It doesn't make it clearer what line 38 does, it's simply a translation from C to plain English.
This is exactly the kind of comments that should be avoided. Not only are they unhelpful, they also produce visual noise that can be quite distracting from the actual code. Even worse if, God forbid, the one reading it doesn't have syntax highlighting.
If you're going to comment on the same line as the code, make sure to leave a generous space between the two. Note, however, that this may annoy people reading from terminals.
Last edited on
Thanks, I hadn't been able to find the name for that previously.

About the commenting, I tend to do that when trying out new things, just so I can get my thoughts straight as to what the code is actually doing and why. I expect I'll do it less as I get more familiar with the language.

Again, thanks, it's appreciated. :)
Yes, one of the differences between qsort() and your algorithm is that your comparisons are inline in the sort function itself whereas qsort() must call a function to compare each element, so for large arrays qsort() spends a lot of time calling compare().

To be fair, you should move your comparison code to a function also.
In your example, you use A twice, effectively passing a sorted array to qsort.

I intialised the arrays with the same values and added a compare function that compared the arrays to confirm correct sorting. I build for debug (so no optimisataions or inlining).

qsort was taking between 53 - 55 seconds on my box. Your algorithm took 5-6 seconds. Replacing inline comparisons in your algorithm with the compare function used by quick sort, it took 7 seconds.

All in all, very impressive.
Let us not ignore the potentially mind-boggling memory use!
Right. For instance, if the array had two elements: 0 and 2^32-1!
Topic archived. No new replies allowed.