I'm using a Heap to keep a best-of list of items in a Matrix. In short, there are n² items in a matrix, of which a (small) part is "good" (value > 0). The good ones are added into a Priority Queue. At each iteration, a part of the Matrix is updated. Depending on the result of the update, the item is inserted, deleted or relocated in the Heap.
(e.g. if a "bad" item becomes "good", it is inserted; if a "good" item gets a different "good" value, it is relocated; etc.)
Since I can't (efficiently) search in a Heap, I keep a two-way link between elements in the Matrix and elements in the Heap.
My current implementation uses a Binary Heap of Element pointers, pointing to their corresponding item in the Matrix (a linearized vector). Each Element has a "Position" variable, that contains its index in the Heap (it's a vector implementation), or an error value if it's not in (thus "bad").
My questions are:
a) Is there a better way of doing this? I know when it comes to updating/deleting in a Heap, a Fibonacci Heap is supposedly much better, but my first implementation wasn't a success on test runs (slightly slower and massive memory usage compared to Binary Heaps). I might return to Fib Heaps later on.
My current annoyances is that each Element in the matrix also keeps its own indexes, as they are meaningful (think distance table). It seems a bit redundant that item (i, j) stores 'i' and 'j', but I can't think of a different way to "fetch" i and j (aside from using its address to calculate its position and so on, which seems stupid).
b) Deleting in a Binary Heap isn't really documented anywhere. My method is to simply swap the last item into the emptied spot, and then check if it has to be up- or downshifted. I'm not entirely sure if this is guaranteed to be correct. I don't see why it wouldn't be, but I've learned not to trust my instinct on these things.
c) Is there are a more efficient way of updating a Heap element, as in a generalized decreaseKey that could go either way? Or would I be better off doing delete->insert (given that delete works)?
Any input is welcome. Basically, I just feel like I'm not doing it the way I should be doing, but I can't think of a better way.
First thought after posting:
If, instead of pointers, I keep (i, j) pairs in the Heap, I can look up the items easily. Since I only need to know i and j when coming from the Heap, this shouldn't affect anything logic-wise.
The benefit would be saving space (the Heap is much smaller than the Matrix), at the cost of increased Swapping costs (copying two ints vs coping a single pointer).
So you are using a priority Queue (your heap structure), however, you are performing random access operations (implemented through a two-way link) on said heap, is this correct?
Are you using your heap for Graphs? It almost sounds like (because of your need to insert/delete/relocate from not the root) that the heap may not be the structure that will benefit you the most, for what you are trying to do. Is there a reason you have chosen heap and not something else, just curious?
> a) I know when it comes to updating/deleting in a Heap, a Fibonacci Heap is supposedly much better, but my first implementation wasn't a success on test runs (slightly slower and massive memory usage compared to Binary Heaps).
That has also been my general experience - A Fibonacci Heap is faster only if N is really large.
> b) Deleting in a Binary Heap isn't really documented anywhere. My method is to simply swap the last item into the emptied spot, and then check if it has to be up- or downshifted
Yes. Worst case O(log N).
c) Is there are a more efficient way of updating a Heap element, as in a generalized decreaseKey that could go either way? Or would I be better off doing delete->insert (given that delete works)?
It can be done similar to 'My method is to simply swap the last item into the emptied spot, and then check if it has to be up- or downshifted'. Instead, update the element and then check if it has to be up- or downshifted. http://cplusplus.com/forum/general/68395/#msg364944
> The benefit would be saving space (the Heap is much smaller than the Matrix), at the cost of increased Swapping costs (copying two ints vs coping a single pointer).
I guess an extra swap of an int log N times shoudn't be noticeably slower.
@JLBorges:
a) I found a Fib Heap to be slower for any number of N (that is: the performance gap between the Binary heap and Fib heap increased slowly with size of N). The thing is, Fibonacci Heaps strongly reduce the amount of comparisons required for all operations. However, in many cases (like mine), you're just comparing two integers, which is very cheap compared to the overhead caused by the pointer arithmetic and terrible cache behaviour in Fib heaps.
(It could be that my implementation wasn't perfect, but I found more reports that confirmed my results than the opposite.)
b) That's what I thought. Curious that nobody mentioned this on any of the Stackoverflow topics I found on random deletes.
c) Yeah, if the answer to b) was "Yes, it's correct", then I figured this would work here as well. Thanks for the confirmation!
@clanmjc:
Yes. That's the odd thing on all Data Structure papers: all of them boast on "See! Operation X in O(1)!", but they all assume that the input (often an element that could be anywhere in the Heap) is already given. Since there is no way to search for a particular item in the Heap, you need some other way of knowing which is where. I use the two-way link:
When performing operations on the heap (generally extractMin()), I can quickly find the Element it relates to (through pointer, or in this case (i, j) index pair). This saves me the trouble of keeping satellite data in the Heap (as it would slow down copying, which is done a lot).
When performing operations on the matrix (generally an updateValue()), I can quickly see where it is in the Heap (through pointer or position int) and call the necessary function (shift-up, shift-down, delete, etc). In case of an Insert, I can easily make a new Heap element, as I know the Element it relates to (in case of pointer) or its indeces, since I'm travelling through the matrix.
If you have a suggestion for other Data Structures to support this need, I'd welcome the advise. All I need is:
-Access to the best item.
-A way to add, update and delete items.
If it matters: changes are expected to much more frequent than extractMin().
Also, something I'm struggling with in case of (Binary) Heaps: Sometimes it's necessary to skip a few items (unknown number). Since (Binary) Heaps only keep track of the best item, this means I have no choice but to extractMin() until a condition is met, and then reinsert all the previous ones. Since all of these are the top N items, reinserting costs a guaranteed log(n) operations. I'm wondering if there's a smarter way?