I'm trying out my own implementations of a few sorting algorithms/containers and am comparing performance through benchmarks. I've found that my implementation of a binary tree for sorting (that is to say, the time it takes to insert every element into the tree) has provided consistently worse performance than that of the std::sort implementation on VS2012. This holds true for a group of data that's 50 elements, 500,000 elements, or 5 million elements. All elements are randomly generated integers.
Is this performance difference just a result of the algorithms themselves, or is it a result of my implementation? (Source below)
.h:
http://pastebin.com/i51Jq1Rh
.inl:
http://pastebin.com/zs6gaU1z
GitHub:
https://github.com/tylercamp/SortMark
The sorting itself starts from within the
Run
method.
I feel as though part of the overhead in my implementation is the amount of recursion that's going on, which I don't think is going to happen as much in an std::sort implementation. This thought is the result of how std::sort works with a linear data set while my binary tree uses, well, a tree.
EDIT: Here are some results of one of my benchmarks:
Element count: 5000000
Name | Validation | Time (seconds)
Binary Tree Sort Succeeded 10.868
STL std::sort Succeeded 1.619 |
EDIT2:
After showing this to a friend, he mentioned that I made a lot of dereferences. Cache misses are definitely going to be an issue, and since quicksort is working with a contiguous memory set (data's in an std::vector) it's got the upper hand since it's doing an in-place sort. Will benchmark with an std::list instead and see how things go.
EDIT3:
std::sort doesn't work with std::list since std::sort requires random access iterators; instead I tried it duking it out against std::list::sort, where my binary tree beat it in about half the time. It's promising, but list::sort isn't the same algorithm, so I'll see if I can find a way to either invoke cache hits against std::sort, or if I can get my binary tree to work with contiguous memory.
EDIT4:
I modified the node reserve so that instead of holding pointers to nodes allocated off the heap, it simply contains node objects that are now stored in contiguous memory. This cut sorting time of 5 million elements in half. Not enough to outpace std::sort, though:
Element count: 5000000
Name | Validation | Time (seconds)
Binary Tree Sort Succeeded 5.967
STL std::sort Succeeded 0.707883 |
(Just realized, I think this test is in Release mode and the previous results were from a Debug build. Ratio is actually worse in this scenario, possibly due to all of the pass-by-value.)