Heaps and time complexity

Hi guys,

so below I have created a minheap,

anyhow from what I have read they say that to poll or get the min element ( or max depending on what type of heap ) it should take O(1) - constant time.

but this doesn't seem to be the case at least not for my heap, so yes it does take O(1) to lets say get a number from a vector example - int a = vec.at(0);

but this just isn't the case with a heap, first we have to get the number which is indeed O(1) but then we have to call heapifyDown() which again brings the new top of the heap down the heap until it finds a suitable placement. and this will require recursion or iteration possibly multiple times depending on how big the heap is.

so why is it said that to remove from a heap is O(1)?

*note I think they say insertion is also O(1), but we have to call heapifyUp() which again entails iteration.

thanks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156

#include <iostream>
#include <vector>

using namespace std;

class Heap{

 public:
     vector<int> h;
     int size;

     Heap(){ size = 0;}

     bool hasParent(int index){

         return (parentIndex(index) >= 0);
     }

     bool hasLeftChild(int index){

        return (index*2+1 < size);
     }

     bool hasRightChild(int index){
       return ( index*2+2  < size);
     }

     int parentIndex(int index){

         if(index % 2 == 0)
            return (index/2)-1;
         else
            return index/2;
     }

     int leftChildIndex(int index){

        if(hasLeftChild(index))
           return (index*2)+1;
        else
            return -1;
     }

     int rightChildIndex(int index){
          if(hasRightChild(index))
            return (index*2)+2;
          else
            return -1;
     }

     int parent(int index){

        if(hasParent(index))
            return h.at(parentIndex(index));
        else
            return -1;
     }

     int leftChild(int index){

         if(hasLeftChild(index))
            return h.at(leftChildIndex(index));
         else
            return -1;
     }

     int rightChild(int index){

        if(hasRightChild(index))
            return h.at(rightChildIndex(index));
        else
            return -1;
     }

     void insert(int obj){

        h.push_back(obj);
        heapifyUp(obj);
        ++size;
     }

     void swap(int& a,int& b){

        int temp = a;
        a = b;
        b = temp;
     }

     void heapifyUp(int obj){

        if(size < 1)
            return;

        int index = size;
        int parIndex = parentIndex(index);

        while(obj < parent(index)){

            swap(h.at(index),h.at(parIndex));
            index = parIndex;
            parIndex = parentIndex(index);
        }
     }

     void heapifyDown(){

        if(size <= 1)
            return;

        int index = 0;
        h.at(index) = h.at(size-1);
        int rightIndex;
        int leftIndex;
        int smallestIndex;

        if(hasRightChild(index)){
            rightIndex = rightChildIndex(index);
            leftIndex  = leftChildIndex(index);
            smallestIndex = ( h.at(rightIndex) < h.at(leftIndex) ? rightIndex : leftIndex);
        }
        else
            smallestIndex = leftChildIndex(index);

        while( h.at(index) >  h.at(smallestIndex) && hasLeftChild(index) ){

            swap(h.at(index),h.at(smallestIndex));
            index = smallestIndex;

            if(hasRightChild(index)){

                rightIndex = rightChildIndex(index);
                leftIndex  = leftChildIndex(index);
                smallestIndex = ( h.at(rightIndex) < h.at(leftIndex) ? rightIndex : leftIndex ? rightIndex : leftIndex);
            }
            else
                smallestIndex = leftChildIndex(index);

                if(smallestIndex == -1)
                    break;
        }
     }

     int poll(){

        if(size == 0)
            return -1;

        int obj = h.at(0);
        heapifyDown();
        --size;
        return obj;
     }

};
"Getting" an extremum of a collection doesn't imply that you'll remove it from the collection. It's perfectly reasonable to just want to know the value of the extremum without wanting to remove it. That operation takes constant time for a heap.
Removing an element is a separate operation, and that does take longer.
Are you sure you're expected to have O(1) complexity on those operations? And at what (space) cost?
"Getting" an extremum of a collection doesn't imply that you'll remove it from the collection. It's perfectly reasonable to just want to know the value of the extremum without wanting to remove it. That operation takes constant time for a heap.
Removing an element is a separate operation, and that does take longer.


very true! it would actually make sense to have a function that just gets the top without removing it.

in the case of insertion, again this isn't O(1) right? as we have to call heapifyUp#

https://en.wikipedia.org/wiki/Binary_heap - says insertion is O(1)
Last edited on
The structure behind the heap matters. The Wikipedia article is making reference to your traditional node-pointer binary heap.

Also, “insertion” does not include finding the insertion spot.

To insert a node in a binary tree this can be done by simply constructing a new node with the correct children.

1
2
3
4
5
6
struct node
{
  type value;
  node* left;
  node* right;
};

The constructor can easily insert a node into the tree in O(1) time:

C
1
2
3
4
5
6
7
8
node* create_node( type value, node* left, node* right )
{
  node* result = (node*)malloc( sizeof( node ) );
  result->value = value;
  result->left = left;
  result->right = right;
  return result;
}

C++
1
2
3
node::node( type value, node* left, node* right )
: value{value}, left{left}, right{right}
{ }

So the O(1) update operation for (a (b c) d) → (a (b c) (d e)) would simply be:

C
1
2
3
a_node->right = create_node( empty_value, // replace the "d" leaf with a branching node
  a_node->right,                          // the old "d" leaf
  create_node( e_value, NULL, NULL ) );   // the new "e" leaf 

C++
1
2
3
a_node->right = new node( empty_value,
  a_node->right,
  new node( e_value, nullptr, nullptr ) );

Again, finding the parent node is not O(1). (It is O(log n) for a balanced binary tree, or O(n) for a degenerate linked list.)

Splitting hairs, it is.

[edit]
You, of course, are using the standard “heap” construct, in which inserts also require a push-down, so it cannot be O(1).
Last edited on
The wikipedia article says insert's average case is O(1), but is that right?

The number of operations required [for insert] depends only on the number of levels the new element must rise to satisfy the heap property, thus the insertion operation has a worst-case time complexity of O(log n) but an average-case complexity of O(1).

It seems to me the average case would be half the height of the tree, so still O(log n).
Yes, just because it is on Wikipedia doesn’t mean you aren’t reading baloney. Though in fairness the author might have meant to say best case is O(1).
Hmm... In the average case the tree would be balanced or almost balanced, and therefore about half of the elements in the heap would only need to rise one level, a quarter would need to rise two levels, an eighth would need to rise 3 levels, etc. 1/2 + 2/4 + 3/8 + 4/16 + ... = 2, therefore the average time is bounded, and thus constant.
Well reasoned.
That makes sense. Is that a kind of "amortized time"?

I was just thinking of adding one element, where it would rise on average half the current height. But we are supposed to be analyzing the rate of increase as more and more elements are added. Obviously the levels increase more and more slowly, which is what bounds the average rise.
This program suggests that the actual average is around 1.25 to 1.3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <vector>
#include <random>

class Heap
{
    std::vector<int> v;
    size_t cnt = 0;

    static size_t parent (size_t i) { return (i - 1) / 2; } // pre: i > 0
    static size_t left   (size_t i) { return 2 * i + 1; }
    static size_t right  (size_t i) { return 2 * i + 2; }

public:
    double stat() const { return double(cnt) / v.size(); }

    void insert (int data)
    {
        v.push_back (data);
        size_t i = v.size() - 1;
        while (i > 0 && v[i] < v[parent (i)])
        {
            ++cnt;
            std::swap (v[i], v[parent (i)]);
            i = parent (i);
        }
    }
};

Heap rnd_heap (int size, int hi = 999999, int lo = 0)
{
    std::default_random_engine rnd {std::random_device{}()};
    std::uniform_int_distribution<> dist(lo, hi);
    Heap h;
    for (int n = size; n--; ) h.insert (dist(rnd));
    return h;
}

int main()
{
    Heap h {rnd_heap (10000)};
    std::cout << h.stat() << '\n';
}

Last edited on
Is that a kind of "amortized time"?
I don't think so. The average case tells you what happens when you're dealing with uniformly distributed random data. It's useful because it gives you a measure of how well the algorithm will perform without knowing anything about the input. The underlying assumption is that most applications will most of the time give inputs that are close to the average of the entire input space, but this assumption may be invalid in specific applications.
On the other hand, an amortized analysis should give you an answer that's independent of the input. It gives you a measure of, realistically, how you can expect the algorithm to perform even if your input is always the worst case.

This program suggests that the actual average is around 1.25 to 1.3.
The methodology is incorrect. In the loop on line 35 you need to reset the counter, insert, then save or print counter/size. Theoretically, the curve should flatten out.

EDIT 2: Sorry, that's still wrong.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Heap rnd_heap (int size, int hi = 999999, int lo = 0){
    std::default_random_engine rnd {std::random_device{}()};
    std::uniform_int_distribution<> dist(lo, hi);
    const int loops = 1000;
    std::vector<double> stats(size);
    for (int i = loops; i--;){
        Heap h;
        for (int n = size; n--; ){
            auto index = h.size();
            h.reset_counter();
            h.insert (dist(rnd));
            stats[index] += h.get_counter() / loops;
        }
    }
    for (auto x : stats)
        std::cout << x << std::endl;
    return h;
}
Last edited on
Topic archived. No new replies allowed.