How to delete in a heap data structure?
I understand how to delete the root node from a max heap but is the procedure for deleting a node from the middle to remove and replace the root repeatedly until the desired node is deleted?
Is O(log n) the optimal complexity for this procedure?
Does this affect the big O complexity since other nodes must be deleted in order to delete a specific node?
Actually, you can remove an item from the middle of a heap without trouble.
The idea is to take the last item in the heap and, starting from the current position (i.e. the position that held the item you deleted), sift it up if the new item is greater than the parent of the old item. If it's not greater than the parent, then sift it down.
That's the procedure for a max heap. For a min heap, of course, you'd reverse the greater and less cases.
Finding an item in a heap is an O(n) operation, but if you already know where it is in the heap, removing it is O(log n).
I published a heap-based priority queue for DevSource a few years back. The full source is at http://www.mischel.com/pubs/priqueue.zip
Update
Several have asked if it's possible to move up after moving the last node in the heap to replace the deleted node. Consider this heap:
1
6 2
7 8 3
If you delete the node with value 7, the value 3 replaces it:
1
6 2
3 8
You now have to move it up to make a valid heap:
1
3 2
6 8
The key here is that if the item you're replacing is in a different subtree than the last item in the heap, it's possible that the replacement node will be smaller than the parent of the replaced node.
The problem with removing an arbitrary element from a heap is that you cannot find it.
In a heap, looking for an arbitrary element is O(n)
, thus removing an element [if given by value] is O(n)
as well.
If it is important for you to remove arbitrary elements form the data structure, a heap is probably not the best choice, you should consider full sorted data structurs instead such as balanced BST or a skip list.
If your element is given by reference, it is however possible to remove it in O(logn)
by simply 'replacing' it with the last leaf [remember a heap is implemented as a complete binary tree, so there is a last leaf, and you know exactly where it is], remove these element, and re-heapify the relevant sub heap.
If you have a max heap, you could implement this by assigning a value larger than any other (eg something like int.MaxValue
or inf
in whichever language you are using) possible to the item to be deleted, then re-heapify and it will be the new root. Then perform a regular removal of the root node.
This will cause another re-heapify, but I can't see an obvious way to avoid doing it twice. This suggests that perhaps a heap isn't appropriate for your use-case, if you need to pull nodes from the middle of it often.
(for a min heap, you can obviously use int.MinValue
or -inf
or whatever)