Insertion sort

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int array[size]= {84, 69, 94, 91};
int size=4;
int i, j, key;
{
for (j=1; j<size; j++)
{
key= array[j];
for (i=j-1; (i>=0) && (array[i]<key); i++)
{
array[i+1]= array[i];
}
array[i+1]=key;
}
return;
}



For me, among all the sorting algorithms that I have read, this sort is the most hard to understand. Could anyone tell me how this sort works? What is 'j' for and also 'i'? How its process goes? Please help!
I don't believe that noone here can (or is willing to?) explain insertion sort. It really isn't that hard. (Well, except perhaps the version above, for it does not work...)

First of all a correct version without error-prone C arrays:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <iterator>
#include <vector>

int main()
        std::vector<int> a; // don't use C arrays...
        std::cout<<"Enter number sequence (end with Ctrl-d)"<<std::endl;
        std::copy(std::istream_iterator<int,char>(std::cin), // why use a self-written loop? This one is better...
                        std::istream_iterator<int,char>(), std::back_inserter(a));
        int j=0, key=0; // *always* initialize variables
        for (int i=1; i<a.size(); ++i) // use pre-increment to avoid unneccessary temorary object
        {
                key= a[i];
                j = i-1;
                while((j >= 0) && (a[j] > key))
                {
                        a[j+1] = a[j];
                        j -= 1;
                }
                a[j+1]=key;
        }
        std::cout<<"Sorted sequence:\n";
        std::copy(a.begin(), a.end(), std::ostream_iterator<int>(std::cout, "\n")); // another loop avoided...
        std::cout<<std::flush;
}

(Actually, the two loops still in the code could be avoided by the usage of std::stable_sort, but that is not the point here, I guess...)

In the outer loop an index i is maintained which points to the location in the array a to which the latter is sorted already (in the beginning, the first element of a is sorted obviously, since a single element cannot contain an inversion).
Then, the next element to be inserted into the sorted sequence is stored in key (this is the first element *after* the sorted sequence, indexed by j). In order to insert it into the sorted sequence (in the right place), all elements > key have to be moved to the right one place, in order to make room for the value stored in key. When the right place is found, the inner loop terminates (condition a[j] > key, or we are at the start of a). After a.size()-1 runs of the outer loop, i equals a.size(). Since i is the index to which a is sorted, the sequence is sorted (after showing that this invariant holds true at all times, you may write 'qed' at this point ;-) )
Last edited on
What exception didn't realize in the above code is that you can not initialize variables when you declare them if you'll initialize them later anyway, as long as you don't use them first.
I'm also pondering his definition of "better loop".

Finally, I want to talk about the use of insertion sort. It's not quite the best sort to be used on completely unsorted arrays. As its name suggests, it's a sort to be used while elements are being inserted into the array (or vector or whatever it is you're using), or sometimes when the array is "almost sorted". For an unsorted array, it's better to use quick sort or merge sort if it needs to be stable. Note, however, that insertion sort is actually faster when sorting while inserting.
you can not initialize variables when you declare them if you'll initialize them later anyway

Yes you can. cf. my code.
as long as you don't use them first.

... no error occurs, right. But tomorrow you will add something between declaration and usage. And if not you, the poor soul who has to maintain the code. And then you either can have a reproducable bug (if you initialized the variable upon declaration) or you can have a non-reproducable bug (the other case). Choose yourfelf.
You might say I'd be a smart ass. Perhaps you are right. In the above code, nothing will be introduced and it won't be maintained. But it isn't that hard to initialize a variable when it is declared. And it has helped me avoid quite some debugging effort.

I'm also pondering his definition of "better loop".

I find it obvious that a loop is better if you can't put an error in it. But I would be interested in any argument against the stl's version.

And finally (to make things interesting) the 65536$-question:

it's a sort to be used [...] when the array is "almost sorted"

Let's assume the array has size n. How many inversions are needed that (in the average case) this version of insertion sort will be faster than merge sort (we have to stay with stable algorithms!)?
Yes you can. cf. my code.

I knew this would happen. I didn't say "you cannot/can't initialize variables when you declare them". I applied the operator can to the expression "not initialize variables when you declare them" to indicate that it is possible to not initialize variables when you declare them.

I find it obvious that a loop is better if you can't put an error in it. But I would be interested in any argument against the stl's version.

I'd say that loop L1 is 9001*milliseconds_saved times better than loop L2

I'm not entirely sure of how I'm supposed to take your question, but that particular insertion sort will never be faster than merge sort on average.
Also, you misquoted me:
it's a sort to be used [...] sometimes when the array is "almost sorted".

If you chose insertion sort based on the fact that, on average, your data source will provide almost sorted data then, on average, that insertion sort may, in fact, be faster than merge sort.
In any case, this analysis is pointless, since, as we all know, quick sort is teh sorting algorithm. Specially when sorting over 9000 elements.
Thanks guys. I kinda get the idea.

anyways, the 'key' works like 'temp'?
am i right?
Topic archived. No new replies allowed.