integer sorting in linear time and space

Forum

Forum
Windows Programming
integer sorting in linear time and space

integer sorting in linear time and space :P

Mar 11, 2010 at 11:42pm

hi! this is a sorting algorithm I came up with the other day.
it's for integer sorting but I suppose you could sort any set,
all you have to do is write an order preserving mapping from
your set to a set of integers. It has O(n*k) time complexity
and O(n*m) space complexity, where k depends on the distribution
of the data and the value of the maximum element, and m only
depends on the distribution of the data (I ll explain exactly
how k and m depend on these things). So, one could say that
as far as it concerns the number of the elements it's linear in
both time and space. But let's see how exactly this works...

suppose you have to sort the following set:
{18,16,12,14,10}
you could do it in linear time if you did this:
(1) run through the array once to determine the min and max elements
(2) set a y=a*x+b mapping which sends min to 0 (first element index)
and max to 4 (last element index, array size -1)
(3) create a new array of 5 ints
(4) map the initial data to the new array
and the new array would be a sorted version of the first!

now, you could say: "LOL, you can't do that with any array,
it's necessary that the data distribution is uniform!"
(which is equivalent to the fact that the distance of two
neighbour elements in the sorted array is constant
-in this example it's two)

yes, I am aware of that fact... so a first approach to sorting this
for example: {9,1,4,25,16}, would be the following:
(1) determine min and max
(2) set a linear mapping which sends min to 0 and max to 23 (max-min-1)
(3) create a new array of 24 (max-min) ints
(4) create a new array of 5 ints
(to hold the "landing" places of the mapping to the previous array)
(5) do the mapping, extract the sorted data from the right places
and there you have your sorted array!

now, you could say: "wow... so if you had to sort {2,4,100,1,3}
you'd reserve memory for 99 ints to sort a five element array?!?!?!"

no, I wouldn't do that :P so, the solution to this problem is to
make the data distribution uniform, using a transformation which
preserves order, and then sort the transformed data. Let's take the previous
example: {9,1,4,25,16} and let's take the square root of each element...
{3,1,2,5,4} :O that's a set of numbers with uniform distribution!
you can sort it with a five element array!

so, you could check if max-min is more than 2*n, and transform the data
until it becomes less than or equal to 2*n, and than do the sorting
with a new array of 2*n (linear space) elements. The time taken for
one application of the transformation is ofc linear and the times the
transformation needs to be applied obviously depend on the maximum
element of the array. (that's how the sorting time depends on the maximum element)

so far so good, but there's just one more hidden problem...
as you transform the data, elements that were initially very close
to each other could end up to the same transformed value...let's
see an example of this problem: suppose you have to sort this:
{5,3,4,1,2,101,103,104,102,105,1004,1005,1002,1001,1003}
transforming this until max-min is <= 2*n would end up with something like:
{a,a,a,a,a,b,b,b,b,b,c,c,c,c,c}, so what would you do now?
the answer is simple, when you create your mapping target, 2*n sized array,
create it not as an array of ints but as an array of vectors of ints ;)
this way you can send the values that are transformed to a (5,3,4,1,2)
to one vector, the values that are transformed to b (101,103,104,102,105)
to another and the values that are transformed to c (1004,1005,1002,1001,1003)
to yet another, and then, sort those vectors with the same algorithm and
there you have it! Now, this is where you can see that the sorting
time and space depend on the data distribution:

|.........................................................................
|..........................................................*.............
|........................................................*.*............
|......................*...............................*...*............
|.....*.............*..*.............................*...*............
|....*.*...........*..*............................*.....*...........
|....*.*..........*....*...........................*.....*...........
|...*...*.........*.....*........................*.........*..........
|...*.A.*........*..B..*......................*.....C....*........
|..*.....*.......*........*....................*..............*.......
|..*......*.*.*.............*.*.*........*.*................*.....
|.*...................................*.*.*.*......................*..
--------------------------------------------------------------

the "mountains" A, B and C would collapse into the values a, b and c
after the transformation, and then each one of them would have to be
sorted separately. so the more the "mountains" on the data distribution
the more the sorting time. and the sorting space depends on the
"mountain" that has the maximum size of elements, because you sort
one "mountain" at a time. (OR you could use multithreading here to
reduce time at the cost of space, anything you like better ;) )

that's all there is to it, pretty much... I wrote a program which implements
this algorithm in c++ and compares it to other known algorithms. It most certainly beats the crap out of selection and bubble sort (I wouldn't be posting this if it didn't at least do that :P ) and I want you to notice that when the data size is big and the data range small, it also beats quicksort! However it doesn't seem to be able to win against the standard c sort :/ What algorithm does Stroustrup use anyway?!?! (however, if it's some quicksort variation, I suppose that with more data size, let's say a billion elements, mine would be faster :P )

I post the code in many parts coz it doesn't fit in one -.-

part 1 of my code:

#include <cstdlib> //rand, srand, time, clock etc...
#include <iostream> //console input/output
#include <fstream> //file input/output
#include <cmath> //log(), sqrt() etc...
#include <vector> //vectors duh...
#include <algorithm> //for stl sort
using namespace std;

//**********************************************************
//declarations:
//**********************************************************

//my timer :D counting time elapsed in milliseconds from start() to stop()
class Timer
{
      unsigned long m_counter;
      bool m_running;
      unsigned long m_t1, m_t2;
      
      public:
      Timer(){m_counter=0;}
      inline void start(){m_running=true; m_t1=clock();}
      inline void stop(){m_running=false; m_t2=clock(); m_counter+=m_t2-m_t1;}
      inline void reset() {m_counter=0;}
      inline unsigned long counter() {return (m_counter*1000)/CLOCKS_PER_SEC;}
      
};



//needed for my algorithm
//associates initial data values with transformed values
//so that I don't have to do the inverse transformation
//(which would after all be impossible since there is 
//loss in precision during the transformation)
struct Pair
{
       int ival;    //initial value
       double tval; //transformed value
};

//my algorithm
void blink_sort(int * source, int * dest, int size);

//blink_sort variation for transformed data
void trans_sort(Pair * source, Pair * dest, int size);

//the transformation used to make data distribution uniform
inline void transform(double & val)
{
       val=1+log(val);
       val*=val;
       val*=10;
}

//make c sort have the desired signature
void c_sort(int * source, int * dest, int size)
{
     sort(dest,dest+size);
}

//*******************************************************
//some quicksort algorithm I found on the net
void swap(int *x,int *y);
int choose_pivot(int i,int j );
void quicksort(int list[],int m,int n);
//********************************************************

//make quicksort have the desired signature
void quick_sort(int * source, int * dest, int size)
{
     quicksort(dest,0,size-1);
}

//some noob sorting algorithms :P
void selection_sort(int * source, int * dest, int size);
void bubble_sort(int * source, int * dest, int size);


//random array generator
void random_array(int * array,int size,int min, int max);
int random(int min, int max);

void print_array(int * array,int size,ostream & os=cout);

const int ARR_SIZE[6]={100,1000,10000,100000,200000,500000};
const int DATA_RANGE[6]={100,1000,10000,50000,1000000,1000000000};
const int ARR_SIZE_SIZE=6;
const int DATA_RANGE_SIZE=6;

const int WIDTH=2; //needed for my algorithm

Last edited on Mar 13, 2010 at 11:30pm

Mar 11, 2010 at 11:44pm

m4ster r0shi (2201)

part 2 of the code:

const char * sort_func_name[]=
{"blink sort","c sort","quicksort","selection sort","bubble sort"};

typedef void (*p_sort_func)(int*,int*,int);

const p_sort_func sort_func_table[]=
{blink_sort,c_sort,quick_sort,selection_sort,bubble_sort};

int main()
{
    int  * array;
    int * sorted;
    Timer timer;
    int i,j,k,l,m;
    
    ofstream fout;
    fout.open("output.txt");
    
    for (i=0; i<ARR_SIZE_SIZE; i++)
    {
        for (j=0; j<DATA_RANGE_SIZE; j++)
        {
            array=new int[ARR_SIZE[i]];
            sorted=new int[ARR_SIZE[i]];
            
            random_array(array,ARR_SIZE[i],1,DATA_RANGE[j]);
            
            /*fout << "unsorted array:"<< endl;
            print_array(array,ARR_SIZE[i],fout);
            fout << '\n' << endl;
            */
            
            cout << "(array size: " << ARR_SIZE[i];
            cout << " data range: 1 - " << DATA_RANGE[j] << ')' << endl;
            
            fout << "(array size: " << ARR_SIZE[i];
            fout << " data range: 1 - " << DATA_RANGE[j] << ')' << endl;
            
            for (k=0; k<5; k++)
            {
                //skip selection and bubble for big array size
                if (k==3 && ARR_SIZE[i]>20000) break;
                
                //needed for c_sort, quick_sort, selection sort & bubble sort
                for (m=0; m<ARR_SIZE[i]; m++) sorted[m]=array[m];
                
                cout << "sorting with " << sort_func_name[k] << "..." << endl;
                
                timer.reset();
                timer.start();
                sort_func_table[k](array,sorted,ARR_SIZE[i]);
                timer.stop();
                
                //to prove that my algorithm 
                //actually sorts the data (LOL)
                /*if (k==0)
                {
                   fout << "sorted array:" << endl;
                   print_array(sorted,ARR_SIZE[i],fout);
                   fout << '\n' << endl;
                }*/
                
                fout << sort_func_name[k] << " time: ";
                fout << timer.counter() << " millisecond(s)" << endl;
                
            }
            fout << endl;
            delete[] array;
            delete[] sorted;
        }
    }
    
    fout.close();
    cout << "check your output.txt! :P" << endl;
    system("pause");
    return 0;
}

//**********************************************************
//implementation of functions used:
//**********************************************************

int random(int min, int max)
{
    if (min==max) return min;
    if (min>max) return random(max,min);
    
    int result;
    result=int(0.5+min+(max-min)*(rand()/double(RAND_MAX)));
    return result;
}

void random_array(int * array,int size,int min=1, int max=1000000)
{
     srand(time(0));
     
     for (int i=0; i<size; i++)
     {
         array[i]=random(min,max);
     }
}

void print_array(int * array,int size,ostream & os)
{
     int i;
     os << '{';
     for (i=0; i<size; i++)
     {
         os << array[i] << ' ';
         if (i%20==19) os << '\n';
     }
     os << '}';
}

//****************************************************************
//some quicksort algorithm I found on the net
void swap(int *x,int *y)
{
   int temp;
   temp = *x;
   *x = *y;
   *y = temp;
}

int choose_pivot(int i,int j )
{
   return((i+j) /2);
}

void quicksort(int list[],int m,int n)
{
   int key,i,j,k;
   if( m < n)
   {
      k = choose_pivot(m,n);
      swap(&list[m],&list[k]);
      key = list[m];
      i = m+1;
      j = n;
      while(i <= j)
      {
         while((i <= n) && (list[i] <= key))
                i++;
         while((j >= m) && (list[j] > key))
                j--;
         if( i < j)
                swap(&list[i],&list[j]);
      }
	  // swap two elements
      swap(&list[m],&list[j]);
	  // recursively sort the lesser list
      quicksort(list,m,j-1);
      quicksort(list,j+1,n);
   }
}
//****************************************************************

void selection_sort(int * source, int * dest, int size)
{
     int min, temp;
     int i,j, minpos;
     
     //for (i=0; i<size; i++) dest[i]=source[i];
     
     
     
     for (i=0; i<size; i++)
     {
         min=dest[i];
         minpos=i;
         for (j=i+1; j<size; j++)
         {
             if (min>dest[j]) {min=dest[j]; minpos=j;}
         }
         if (minpos!=i)
         {
            temp=dest[i];
            dest[i]=min;
            dest[minpos]=temp;
         }
     }
}

void bubble_sort(int * source, int * dest, int size)
{
     bool sorted=true;
     int i;
     int temp;
     
    //for (i=0; i<size; i++) dest[i]=source[i];
     
     int count=0;
     while (true)
     {
           sorted=true;
           for (i=0; i<size-1-count; i++)
           {
                if (dest[i]>dest[i+1])
                {
                   temp=dest[i];
                   dest[i]=dest[i+1];
                   dest[i+1]=temp;
                   sorted=false;
                }   
           }
           if (sorted==true) break;
           count++;
     }
}

Last edited on Apr 3, 2010 at 9:23am

Mar 11, 2010 at 11:44pm

m4ster r0shi (2201)

part 3 of the code:

void blink_sort(int * source, int * dest, int size)
{
     //find min,max of source
     int min,max;
     int i,j,n;
     
     min=max=source[0];
     for (i=1; i<size; i++)
     {
         if (min>source[i]) min=source[i];
         if (max<source[i]) max=source[i];
     }
     
     
     //if min==max then all elements are equal, dest=source
     
     if (min==max)
     {
        for (i=0; i<size; i++)
        {
            dest[i]=source[i];
        }
        return;
     }
     
     //if max-min<=WIDTH*size
     //allocate memory for WIDTH*size vector<int>s
     //map the source to the new array
     //blink_sort the mapped elements with count>1
     //copy the sorted data to dest
     //free the buffer
     
     if (max-min<=WIDTH*size)
     {
         //allocate memory for the vectors
         vector<int> * buffer=new vector<int>[int(WIDTH*size)];
         
         //set the mapping parameters
         double a,b; //y=a*x+b
         /*
         0=a*min+b
         WIDTH*size-1=a*max+b
         a=(WIDTH*size-1)/(max-min)
         b=-a*min
         */
         
         a=(WIDTH*size-1)/double(max-min);
         b=-a*min;
         
         //do the mapping
         for (i=0; i<size; i++)
         {
             buffer[int(a*source[i]+b)].push_back(source[i]);
             
         }
         
         //blink_sort the elements if needed
         for (i=0; i<WIDTH*size; i++)
         {
             n=buffer[i].size();
             if (n>1)
             {
                int * new_source=new int[n];
                int * new_dest=new int[n];
                for (j=0; j<n; j++)
                {
                    new_source[j]=buffer[i][j];
                }
                
                blink_sort(new_source,new_dest,n);
                
                for (j=0; j<n; j++)
                {
                    buffer[i][j]=new_dest[j];
                }
                delete[] new_source;
                delete[] new_dest;
             }
         }
         
         //copy the sorted data to dest
         int index=0;
         for (i=0; i<WIDTH*size; i++)
         {
             n=buffer[i].size();
             if (n>0)
             {
                for (j=0; j<n; j++)
                {
                    dest[index]=buffer[i][j];
                    index++;
                }
             }
         }
         
         //bb buffer...
         delete[] buffer;
         
         
         return;        
     }
     
     
     
     //if max-min>WIDTH*size
     //make data pairs from source
     //transform until max-min<=WIDTH*size
     //trans_sort (blink_sort for transformed data)
     
     Pair * p_source=new Pair[size];
     Pair * p_dest=new Pair[size];
     for (i=0; i<size; i++) p_source[i].ival=p_source[i].tval=source[i];
     
     double dmax=max, dmin=min;
     while (dmax-dmin>WIDTH*size)
     {
           for (i=0; i<size; i++) transform(p_source[i].tval);
           transform(dmin);
           transform(dmax);
     }
     
     trans_sort(p_source,p_dest,size);
     
     //copy the sorted data to dest
     int index=0;
     for (i=0; i<size; i++)
     {
         dest[i]=p_dest[i].ival;
     }
     
     delete[] p_source;
     delete[] p_dest;
     
     return;
}

void trans_sort(Pair * source, Pair * dest, int size)
{
      //find min,max of source
     double min,max;
     int i,j,n;
     
     min=max=source[0].tval;
     for (i=1; i<size; i++)
     {
         if (min>source[i].tval) min=source[i].tval;
         if (max<source[i].tval) max=source[i].tval;
     }
     
     
     //if min==max then all elements are equal, dest=source
     
     if (min==max)
     {
        for (i=0; i<size; i++)
        {
            dest[i].ival=source[i].ival;
            dest[i].tval=source[i].tval;
        }
        return;
     }
     
     //max-min<=WIDTH*size is always true at this point
     //allocate memory for WIDTH*size vector<Pair>s
     //map the source to the new array
     //trans_sort the mapped elements with count>1
     //copy the sorted data to dest
     //free the buffer
     
   
    //allocate memory for the vectors
    vector<Pair> * buffer=new vector<Pair>[int(WIDTH*size)];
    
    //set the mapping parameters
    double a,b; //y=a*x+b
    /*
    0=a*min+b
    WIDTH*size-1=a*max+b
    a=(WIDTH*size-1)/(max-min)
    b=-a*min
    */
    
    a=(WIDTH*size-1)/double(max-min);
    b=-a*min;
    
    //do the mapping
    for (i=0; i<size; i++)
    {
        buffer[int(a*(source[i].tval)+b)].push_back(source[i]);
    }
    
    //trans_sort the elements if needed
    for (i=0; i<WIDTH*size; i++)
    {
        n=buffer[i].size();
        if (n>1)
        {
           int * new_source=new int[n];
           int * new_dest=new int[n];
           for (j=0; j<n; j++)
           {
               new_source[j]=buffer[i][j].ival;
               //new_source[j].tval=buffer[i][j].tval;
           }
           
           blink_sort(new_source,new_dest,n);
           
           for (j=0; j<n; j++)
           {
               buffer[i][j].ival=new_dest[j];
           }
           delete[] new_source;
           delete[] new_dest;
        }
    }
    
    //copy the sorted data to dest
    int index=0;
    for (i=0; i<WIDTH*size; i++)
    {
        n=buffer[i].size();
        if (n>0)
        {
           for (j=0; j<n; j++)
           {
               dest[index].ival=buffer[i][j].ival;
               index++;
           }
        }
    }
    
    //bb buffer...
    delete[] buffer;
    
    return;        
   
}

Mar 12, 2010 at 12:15pm

PJYang2010 (5)

I ran the above program with Visual C++ 6.0, with c sort function not implemented, and I want to list part of the result below, just for comparison:

(array size: 200000 data range: 1 - 100)
blink sort time: 344 millisecond(s)
quick_sort time: 640 millisecond(s)
selection_sort time: 59672 millisecond(s)

(array size: 200000 data range: 1 - 1000)
blink sort time: 390 millisecond(s)
quick_sort time: 141 millisecond(s)
selection_sort time: 60156 millisecond(s)

(array size: 200000 data range: 1 - 10000)
blink sort time: 500 millisecond(s)
quick_sort time: 94 millisecond(s)
selection_sort time: 59328 millisecond(s)

(array size: 200000 data range: 1 - 50000)
blink sort time: 656 millisecond(s)
quick_sort time: 109 millisecond(s)
selection_sort time: 59235 millisecond(s)

(array size: 200000 data range: 1 - 1000000)
blink sort time: 719 millisecond(s)
quick_sort time: 93 millisecond(s)
selection_sort time: 59266 millisecond(s)

(array size: 200000 data range: 1 - 1000000000)
blink sort time: 703 millisecond(s)
quick_sort time: 109 millisecond(s)
selection_sort time: 59469 millisecond(s)

(array size: 500000 data range: 1 - 100)
blink sort time: 937 millisecond(s)
quick_sort time: 3781 millisecond(s)
selection_sort time: 375625 millisecond(s)

(array size: 500000 data range: 1 - 1000)
blink sort time: 922 millisecond(s)
quick_sort time: 578 millisecond(s)
selection_sort time: 376547 millisecond(s)

(array size: 500000 data range: 1 - 10000)
blink sort time: 1094 millisecond(s)
quick_sort time: 282 millisecond(s)
selection_sort time: 376375 millisecond(s)

(array size: 500000 data range: 1 - 50000)
blink sort time: 1344 millisecond(s)
quick_sort time: 265 millisecond(s)
selection_sort time: 375813 millisecond(s)

(array size: 500000 data range: 1 - 1000000)
blink sort time: 1328 millisecond(s)
quick_sort time: 266 millisecond(s)
selection_sort time: 375984 millisecond(s)

(array size: 500000 data range: 1 - 1000000000)
blink sort time: 1516 millisecond(s)
quick_sort time: 281 millisecond(s)
selection_sort time: 375860 millisecond(s)

Mar 12, 2010 at 12:25pm

m4ster r0shi (2201)

wow... you actually waited that long to get the results?!?!
anyway, thnx for bothering to test it :D and I suppose you could use
the c sort if u include <algorithm> or something like that (I use Dev c++
and I don't have to include it but with VS I think you have to)

Mar 12, 2010 at 1:36pm

PJYang2010 (5)

oh no, i got the results while i was having supper; Perhaps i happened to forget that the c sort is included in <algorithm>, so i omitted "c_sort" in the line:
const p_sort_func sort_func_table[]={blink_sort,c_sort,quick_sort,selection_sort,bubble_sort};
thanks, i gonna to rerun it with c short included tomorrow and see the result.

Topic archived. No new replies allowed.