hi! this is a sorting algorithm I came up with the other day.
it's for integer sorting but I suppose you could sort any set,
all you have to do is write an order preserving mapping from
your set to a set of integers. It has O(n*k) time complexity
and O(n*m) space complexity, where k depends on the distribution
of the data and the value of the maximum element, and m only
depends on the distribution of the data (I ll explain exactly
how k and m depend on these things). So, one could say that
as far as it concerns the number of the elements it's linear in
both time and space. But let's see how exactly this works...
suppose you have to sort the following set:
{18,16,12,14,10}
you could do it in linear time if you did this:
(1) run through the array once to determine the min and max elements
(2) set a y=a*x+b mapping which sends min to 0 (first element index)
and max to 4 (last element index, array size -1)
(3) create a new array of 5 ints
(4) map the initial data to the new array
and the new array would be a sorted version of the first!
now, you could say: "LOL, you can't do that with any array,
it's necessary that the data distribution is uniform!"
(which is equivalent to the fact that the distance of two
neighbour elements in the sorted array is constant
-in this example it's two)
yes, I am aware of that fact... so a first approach to sorting this
for example: {9,1,4,25,16}, would be the following:
(1) determine min and max
(2) set a linear mapping which sends min to 0 and max to 23 (max-min-1)
(3) create a new array of 24 (max-min) ints
(4) create a new array of 5 ints
(to hold the "landing" places of the mapping to the previous array)
(5) do the mapping, extract the sorted data from the right places
and there you have your sorted array!
now, you could say: "wow... so if you had to sort {2,4,100,1,3}
you'd reserve memory for 99 ints to sort a five element array?!?!?!"
no, I wouldn't do that :P so, the solution to this problem is to
make the data distribution uniform, using a transformation which
preserves order, and then sort the transformed data. Let's take the previous
example: {9,1,4,25,16} and let's take the square root of each element...
{3,1,2,5,4} :O that's a set of numbers with uniform distribution!
you can sort it with a five element array!
so, you could check if max-min is more than 2*n, and transform the data
until it becomes less than or equal to 2*n, and than do the sorting
with a new array of 2*n (linear space) elements. The time taken for
one application of the transformation is ofc linear and the times the
transformation needs to be applied obviously depend on the maximum
element of the array. (that's how the sorting time depends on the maximum element)
so far so good, but there's just one more hidden problem...
as you transform the data, elements that were initially very close
to each other could end up to the same transformed value...let's
see an example of this problem: suppose you have to sort this:
{5,3,4,1,2,101,103,104,102,105,1004,1005,1002,1001,1003}
transforming this until max-min is <= 2*n would end up with something like:
{a,a,a,a,a,b,b,b,b,b,c,c,c,c,c}, so what would you do now?
the answer is simple, when you create your mapping target, 2*n sized array,
create it not as an array of ints but as an array of vectors of ints ;)
this way you can send the values that are transformed to a (5,3,4,1,2)
to one vector, the values that are transformed to b (101,103,104,102,105)
to another and the values that are transformed to c (1004,1005,1002,1001,1003)
to yet another, and then, sort those vectors with the same algorithm and
there you have it! Now, this is where you can see that the sorting
time and space depend on the data distribution:
|.........................................................................
|..........................................................*.............
|........................................................*.*............
|......................*...............................*...*............
|.....*.............*..*.............................*...*............
|....*.*...........*..*............................*.....*...........
|....*.*..........*....*...........................*.....*...........
|...*...*.........*.....*........................*.........*..........
|...*.A.*........*..B..*......................*.....C....*........
|..*.....*.......*........*....................*..............*.......
|..*......*.*.*.............*.*.*........*.*................*.....
|.*...................................*.*.*.*......................*..
--------------------------------------------------------------
the "mountains" A, B and C would collapse into the values a, b and c
after the transformation, and then each one of them would have to be
sorted separately. so the more the "mountains" on the data distribution
the more the sorting time. and the sorting space depends on the
"mountain" that has the maximum size of elements, because you sort
one "mountain" at a time. (OR you could use multithreading here to
reduce time at the cost of space, anything you like better ;) )
that's all there is to it, pretty much... I wrote a program which implements
this algorithm in c++ and compares it to other known algorithms. It most certainly beats the crap out of selection and bubble sort (I wouldn't be posting this if it didn't at least do that :P ) and I want you to notice that when the data size is big and the data range small, it also beats quicksort! However it doesn't seem to be able to win against the standard c sort :/ What algorithm does Stroustrup use anyway?!?! (however, if it's some quicksort variation, I suppose that with more data size, let's say a billion elements, mine would be faster :P )
I post the code in many parts coz it doesn't fit in one -.-
part 1 of my code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
|
#include <cstdlib> //rand, srand, time, clock etc...
#include <iostream> //console input/output
#include <fstream> //file input/output
#include <cmath> //log(), sqrt() etc...
#include <vector> //vectors duh...
#include <algorithm> //for stl sort
using namespace std;
//**********************************************************
//declarations:
//**********************************************************
//my timer :D counting time elapsed in milliseconds from start() to stop()
class Timer
{
unsigned long m_counter;
bool m_running;
unsigned long m_t1, m_t2;
public:
Timer(){m_counter=0;}
inline void start(){m_running=true; m_t1=clock();}
inline void stop(){m_running=false; m_t2=clock(); m_counter+=m_t2-m_t1;}
inline void reset() {m_counter=0;}
inline unsigned long counter() {return (m_counter*1000)/CLOCKS_PER_SEC;}
};
//needed for my algorithm
//associates initial data values with transformed values
//so that I don't have to do the inverse transformation
//(which would after all be impossible since there is
//loss in precision during the transformation)
struct Pair
{
int ival; //initial value
double tval; //transformed value
};
//my algorithm
void blink_sort(int * source, int * dest, int size);
//blink_sort variation for transformed data
void trans_sort(Pair * source, Pair * dest, int size);
//the transformation used to make data distribution uniform
inline void transform(double & val)
{
val=1+log(val);
val*=val;
val*=10;
}
//make c sort have the desired signature
void c_sort(int * source, int * dest, int size)
{
sort(dest,dest+size);
}
//*******************************************************
//some quicksort algorithm I found on the net
void swap(int *x,int *y);
int choose_pivot(int i,int j );
void quicksort(int list[],int m,int n);
//********************************************************
//make quicksort have the desired signature
void quick_sort(int * source, int * dest, int size)
{
quicksort(dest,0,size-1);
}
//some noob sorting algorithms :P
void selection_sort(int * source, int * dest, int size);
void bubble_sort(int * source, int * dest, int size);
//random array generator
void random_array(int * array,int size,int min, int max);
int random(int min, int max);
void print_array(int * array,int size,ostream & os=cout);
const int ARR_SIZE[6]={100,1000,10000,100000,200000,500000};
const int DATA_RANGE[6]={100,1000,10000,50000,1000000,1000000000};
const int ARR_SIZE_SIZE=6;
const int DATA_RANGE_SIZE=6;
const int WIDTH=2; //needed for my algorithm
|