This is an external merge sort. Instead of arrays, we use three additional files for temporary storage.
A is our original file to be sorted, B, C, and D are the three additional files .
Here we start with an initial run of two numbers. And we never have to hold more that two numbers in memory.
Instead of starting with very short runs, usually a hybrid algorithm is used, where the initial pass will read many records into memory, do an internal sort to create a long run, and then distribute those long runs onto the output set. The step avoids many early passes. For example, an internal sort of 1024 records will save 9 passes. - wiki |
Pass 1. Merge pairs of records from A; writing two-record sublists alternately to C and D.
Say A contains 3,2,6,4,1,8,5,7.
Read the first pair of integers from A (3,2), write them into C in order (2,3)
Read the next pair of integers from A (6,4), write them into D in order (4,6)
Read the third pair of integers from A (1,8), write them into C in order (1,8)
Read the next pair of integers from A (5,7), write them into D in order (5,7)
At the end of this step,
C contains 2,3 1,8
D contains 4,6 5,7
Pass 2. Merge two-record sublists from C and D into four-record sublists; writing these alternately to A and B.
Merge (2,3) from C and (4,6) from D into A (2,3,4,6)
Merge (1,8) from C and (5,7) from D into B (1,5,7,8)
Pass 3. Merge four-record sublists from A and B into eight-record sublists; writing these alternately to C and D
Merge (2,3,4,6) from A and (1,5,7,8) from B into C (1,2,3,4,5,6,7,8)
In this small example, we have only eight numbers and this merge completes the sort
If we had more numbers:
4. Repeat until you have one list containing all the data, sorted --- in log2(n) passes.