Help with "very fast" radix sort

Forum

Forum
General C++ Programming
Help with "very fast" radix sort

Help with "very fast" radix sort

Hi, I'm attempting to build a column based database, and I'm new to C++ (just wanted to play around with building a column base database "for the fun of it"). I want to construct a very fast radix sort, that would allow me to quickly sort groups of columns based on integer values. My general preference is to take up more RAM to get more performance.

I'd like to build the radix sort by allowing 256 "buckets" to drop values in as I'm sorting. This allows me to sort through a group of 4 byte integers in only 4 passes. But assuming I'm trying to sort a large group of values (say 50+ million), I'm not sure what type of container to use for these. Also note I'm pretty unfamiliar with the "standard library", so below are my thoughts:

Vectors: Pros: Easy to use, and very fast for sequential and random access inserts / reads
Cons: If they have to dynamically resize because a given vector wasn't large enough, this can apparently really slow performance. Unless I make another pass over the numbers before I start sorting, I wouldn't know how big to make individual the individual vectors. This means I either have to make them "too big" and waste space, or pay a performance price for either resizing, or scanning data first.

Lists: Pros: Seems like I wouldn't have to specify size ahead of time, so I could just easily insert values to a given list. Also, since I don't need random access reads (I'll ready the "0" list sequentially, then the "1" list, etc. they should work fine.
Cons: I don't really know much about lists, but I'm not sure how easy it is to append a new value to the end of a list. I've read that standard library lists include both "forward" and "backward" pointers, which I don't need. Also, I find it hard to believe that there isn't some time taken up with memory allocation. If I build a list and append x million records in it, is it calling memory allocation routines x million times?

Or maybe there's another container type I should learn?

Again, my goal is to make this "fast", not "memory efficient". But having said that, the fastest way I could think of (use 256 vectors, each sized equal to the total number of members to be sorted) is just too much memory to burn - 256 times a vector big enough to hold millions of elements is too much.

Any ideas are greatly appreciated! And truthfully, I'd also prefer simplicity over something "too" complicated...

Thanks!
Scott

TheIdeasMan (6817)

Hi Scott,

There is std::unordered_set , I haven't personally used it, but there are a number of experts here who have. Apparently it uses hashing, which is efficient for retrieval, not sure how that would go efficiency wise with sorting.

Should be able to find a comparison table for all the std containers, to compare insertion, retrieval, sorting etc.

http://www.cplusplus.com/reference/unordered_set/unordered_set/

Smac89 (1727)

std::set or std::map

Each of the above containers makes use a balanced red_black tree to sort it's values which means that retrieval of any element takes Olg₂N time in the worst case. std::map acts as a symbol table whereas std::set is more of a container (which is what I think you are going for)

I've never implemented radix sort, but wikipedia provides C code that implements it:
http://en.wikipedia.org/wiki/Radix_sort#Example_in_C

understand the code before using it.

Using std::vector is advantageous but it comes at the cost of expensive copying. i.e. when a vector is full, all it's contents are copied to a new location and any pointers you might have had to the original are invalidated. Also on systems that have low memory, having to allocate this large amount of space might not be possible

std::deque is another standard container which allows random access, it doesn't copy it's contents to a new location when full, instead it allocates new memory where more push operations are done. Which means that in the end, a deque is made up of small chunks of memory. Disadvantage of deque is that the random access is a little slower than vector because it needs to calculate the exact location due to the way it allocates memory.

Since memory is not a problem for you, vector might be the container to go with. Either store all the values you want in std::set then copy them to a vector or store all values in a vector and radix_sort the vector

Last edited on

SPowell42 (3)

Hi TheIdeasMan,

I like the idea, but it's not quite what I'm looking for. The radix sort works like this (using base 10 instead of binary, and assuming values of 1 through 100) - look at the "ones" digit of a number, and put all of the numbers that have matching digits into a "bucket". Then pull those off sequential and repeat with the "tens" digit.

So if my list of numbers is:
53, 42, 97, 18, 31, 11

On the first pass through, I get the following in the 10 buckets:

Ones bucket: 31, 11 (both 31 and 11 have ones digit = "1" so they go into this bucket. But they are not "sorted", I just stuff them into the bucket in the order I hit them while stepping through the list of items to sort)
Twos bucket: 42
Threes bucket: 53
Sevens bucket: 97
Eights bucket: 18
Buckets for 0, 4, 5, 6, and 9 are empty

Note that I don't do any "sorting" here, I just lump the numbers into a bucket based on their "ones" digit. I then will pull the numbers off in order, starting with the "ones" bucket. So at the end of the "pull" I have:

31, 11, 42, 53, 97, 18

Then I repeat the exact same process with the "tens" digit. The results of that are:

Ones bucket: 11, 18 (because tens digit of 11 and 18 is a "1")
Threes bucket: 31
Fours bucket: 42
Fives bucket: 53
Nines bucket: 97

When I pull these off in bucket order, the sorted result is:

11, 18, 31, 42, 53, 97

Note that I don't have to access the buckets in any sort of random order, I just put the numbers into them, and then pull them off sequentially. Also in my case I'm using binary digits and groups of 8 bits instead of base 10 numbers.

For me, the "perfect" container for this would:
1) Be able to store millions of items but be dynamically sized (as you saw in the example above, some buckets had multiple values, while others were empty)
2) Be very fast to insert into and pull off of (note, I do not need random access, I just pull the values off of them in the same order they were added to the container)
3) Have no delays due to resizing, memory allocation, etc.
4) Not take up a lot of space on container "internals" (i.e. for doubly linked lists, there are two pointers for every member)

Just not sure what's the best way to go. Unordered sets look like they are doing their own type of "hashing" or searching, so not quite what I'm looking for. Thanks for the idea though - these may be very useful for other things I'll need!

Thanks,
Scott

SPowell42 (3)

Hi Smac89,

Appreciate the ideas, but also not quite what I'm looking for. I need a container that's essentially linear time. Doing fancy stuff like balancing trees, etc. in the containers just wastes cycles, because each container doesn't need to do any sorting. I just need to be able to stuff things in and pull them back out in the same order.

I'm seriously considering just building a linked list, but having the "head" node hold a pointer to both the "first" and "last" elements in the list. The "last" element would point to the end of the list, and make it linear time to insert a new item on the end. The "first" would just be used when I'm all done pushing stuff on and ready to start pulling off.

Do you know if the memory allocation in a linked list, especially if you're adding elements one by one (and there may be MILLIONS of items to add), takes a significant amount of time?

Thx,
Scott

Topic archived. No new replies allowed.