Optimising Merge Sort

Apr 4, 2010 at 9:21am
For a bit of fun I coded my own implementation of the merge sort algorithm, after watching a lecture from ocw.mit.edu (really cool site!) which discussed it.

My code is below (minus a main which just populates a list randomly and calls mergesort on it). I tested it in comparison to insertsort (which I also coded) and it works much faster on my machine for large lists (N > 1000 or something), as expected.

My question: Does anyone have any suggestions for optimising it further?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#define SUCCESS 1

template <typename T>
int mergesort(T * list, const int numElements)
{
	// If our list has 0 or 1 elements, return (i.e. already sorted)
	if(numElements < 2)
		return SUCCESS;

	// We need to 'divide and conquer' our list
	// Calculate the number of elements we'll need in each of our two smaller lists
	int numInBottom = (int)ceil((double)numElements/2.0);	// Any way to do this better with ints?
	int numInTop = numElements - numInBottom;

	// Run merge sort on both
	mergesort<T>(list, numInBottom);	// Bottom list
	mergesort<T>(list + numInBottom, numInTop);		// Top list

	// Merge the results
	merge<T>(list, list+numInBottom, numInBottom, numInTop);

	return SUCCESS;
}

// Merges two sorted lists into one. Places the result at the start of the 'bottom' list.
// After returning successfully, the 'bottom' list has (numInBottom + numInTop) elements.
// Assumes parameters 'bottom' and 'top' are pre-sorted.
template <typename T>
int merge(T * bottom, T * top, const int numInBottom, const int numInTop)
{
	// Indexers for bottom and top lists
	int bIndex = 0, tIndex = 0;

	// Create a new collection to store our result temporarily
	T * result = new T[numInBottom+numInTop];

	while((bIndex < numInBottom) && (tIndex < numInTop))
	{
		// Find which of our current pair of elements should be put into our result next
		if(bottom[bIndex] < top[tIndex])
			result[(bIndex++)+tIndex] = bottom[bIndex];
		else
			result[bIndex+(tIndex++)] = top[tIndex];
	}

	// At this point one of the lists is empty.
	// Concatenate what's left of the other one onto the end of our result.
	if(bIndex < numInBottom)
		memcpy(result + (bIndex+tIndex), bottom + bIndex, sizeof(T)*(numInBottom - bIndex));
	else
		memcpy(result + (bIndex+tIndex), top + tIndex, sizeof(T)*(numInTop - tIndex));

	// Copy our result back to the original list
	memcpy(bottom, result, sizeof(T)*(numInTop+numInBottom));

	// Deallocate our dynamically-allocated memory
	delete [] result;

	return SUCCESS;
}
Apr 4, 2010 at 10:05am
does adding a thread reduce the complexity from n log n to n/2 * log n ?
Apr 4, 2010 at 10:37am
Hmmmm...I'm not sure. Anyone? Nice idea though, I might see if I can find something on that.

I remember reading on this site somewhere that my style on line 12 (converting to double then back to int) is to be avoided generally. If I have two ints a and b, what's the best way of getting ceil (a/b)? There's no overload for ceil that takes two ints... :(
Apr 4, 2010 at 11:29am
No need for ceil here, or for any other bit magic. Simply dividing with 2 will do:
int numInBottom = numElements/2;
Most compilers will further optimize this with bitshifting int numInBottom = numElements>>1;.

EDIT: I'm not sure how safe it is to use memcpy. Memcpy doesn't call constructors, it only makes binary copies of the object. If the object has dynamic memory management, (or maybe inheritance?), this sort function might break it. I recommend using std::copy() instead, which will boil down to memcpy() anyway wherever possible.
Last edited on Apr 4, 2010 at 11:39am
Apr 4, 2010 at 11:29am
Cant you just write (this equivalent to your ceil version, but R0mai is correct.)

 
((numElements%2 == 0) ? (numElements/2) : (numElements/2 + 1))


Sorry i do not have much time today, i will write some assumption later or tomorrow.

Maikel
Last edited on Apr 4, 2010 at 11:30am
Apr 4, 2010 at 12:04pm
http://www.cplusplus.com/reference/algorithm/merge/
This does exactly what your merge function does, only faster and safer.

I made this based on your original:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
template<typename T>
void mergesort(T * list, const int numElements) {
    if (numElements < 2)
        return;

    int numInBottom = numElements >> 1;
    int numInTop = numElements - numInBottom;

    mergesort<T> (list, numInBottom);
    mergesort<T> (list + numInBottom, numInTop);

    T * result = new T[numInBottom + numInTop];
    std::merge(list, list + numInBottom, list + numInBottom, list + numInBottom + numInTop, result);
    std::copy( result, result + numInBottom + numInTop, list );

    delete [] result;
}


It runs ~5% faster than your original.
I think the bottleneck might be in the dynamic memory allocation.

Further improvement :
Using iterators (as in STL algorithms) instead of pointers will make more general and easier to use.
Last edited on Apr 4, 2010 at 12:04pm
Apr 4, 2010 at 1:04pm
All great info...thanks guys!
Topic archived. No new replies allowed.