What is the intuition behind the Fibonacci heap data structure?

I've read the Wikipedia article on Fibonacci heaps and read CLRS's description of the data structure, but they provide little intuition for why this data structure works. Why are Fibonacci heaps designed the way they are? How do they work?

Thanks!

This answer is going to be pretty long, but I hope it helps provide some insight as to where the Fibonacci heap comes from. I'm going to assume that you're already familiar with binomial heaps and amortized analysis.

Motivation: Why Fibonacci Heaps?

Before jumping into Fibonacci heaps, it's probably good to explore why we even need them in the first place. There are plenty of other types of heaps (binary heaps and binomial heaps, for example), so why do we need another one?

The main reason comes up in Dijkstra's algorithm and Prim's algorithm. Both of these graph algorithms work by maintaining a priority queue holding nodes with associated priorities. Interestingly, these algorithms rely on a heap operation called decrease-key that takes an entry already in the priority queue and then decreases its key (i.e. increases its priority). In fact, a lot of the runtime of these algorithms is explained by the number of times you have to call decrease-key. If we could build a data structure that optimized decrease-key, we could optimize the performance of these algorithms. In the case of the binary heap and binomial heap, decrease-key takes time O(log n), where n is the number of nodes in the priority queue. If we could drop that to O(1), then the time complexities of Dijkstra's algorithm and Prim's algorithm would drop from O(m log n) to (m + n log n), which is asymptotically faster than before. Therefore, it makes sense to try to build a data structure that supports decrease-key efficiently.

There is another reason to consider designing a better heap structure. When adding elements to an empty binary heap, each insertion takes time O(log n). It's possible to build a binary heap in time O(n) if we know all n elements in advance, but if the elements arrive in a stream this isn't possible. In the case of the binomial heap, inserting n consecutive elements takes amortized time O(1) each, but if insertions are interlaced with deletions, the insertions may end up taking Ω(log n) time each. Therefore, we might want to search for a priority queue implementation that optimizes insertions to take time O(1) each.

Step One: Lazy Binomial Heaps

To start off building the Fibonacci heap, we're going to begin with a binomial heap and modify it try to make insertions take time O(1). It's not all that unreasonable to try this out - after all, if we're going to do a lot of insertions and not as many dequeues, it makes sense to optimize insertions.

If you'll recall, binomial heaps work by storing all of the elements in the heap in a collection of binomial trees. A binomial tree of order n has 2ⁿ nodes in it, and the heap is structures as a collection of binomial trees that all obey the heap property. Typically, the insertion algorithm in a binomial heap work as follows:

Create a new singleton node (this is a tree of order 0).
If there is a tree of order 0:
- Merge the two trees of order 0 together into a tree of order 1.
- If there is a tree of order 1:
  - Merge the two trees of order 1 together into a tree order 2.
  - If there is a tree of order 2:
    - ...

This process ensures that at each point in time, there is at most one tree of each order. Since each tree holds exponentially more nodes than its order, this guarantees that the total number of trees is small, which lets dequeues run quickly (because we don't have to look at too many different trees after doing a dequeue-min step).

However, this also means that the worst-case runtime of inserting a node into a binomial heap is Θ(log n), because we might have Θ(log n) trees that need to get merged together. Those trees need to be merged together only because we need to keep the number of trees low when doing a dequeue step, and there's absolutely no benefit in future insertions to keeping the number of trees low.

This introduces the first departure from binomial heaps:

Modification 1: When inserting a node into the heap, just create a tree of order 0 and add it to the existing collection of trees. Do not consolidate trees together.

There is another change we can make. Normally, when we merge together two binomial heaps, we do a merge step to combine them together in a way that ensures that there is at most one tree of each order in the resulting tree. Again, we do this compression so that dequeues are fast, and there's no real reason why the merge operation ought to have to pay for this. Therefore, we'll make a second change:

Modification 2: When merging two heaps together, just combine all their trees together without doing any merging. Do not consolidate any trees together.

If we make this change, we pretty easily get O(1) performace on our enqueue operations, since all we're doing is creating a new node and adding it to the collection of trees. However, if we just make this change and don't do anything else, we completely break the performance of the dequeue-min operation. Recall that dequeue-min needs to scan across the roots of all the trees in the heap after removing the minimum value so that it can find the smallest value. If we add in Θ(n) nodes by inserting them in the way, our dequeue operation will then have to spend Θ(n) time looking over all of these trees. That's a huge performance hit... can we avoid it?

If our insertions really just add more trees, then the first dequeue we do will certainly take Ω(n) time. However, that doesn't mean that every dequeue has to be expensive. What happens if, after doing a dequeue, we coalesce all the trees in the heap together such that we end up with only one tree of each order? This will take a long time initially, but if we start doing multiple dequeues in succession, those future dequeues will be significantly faster because there are fewer trees lying around.

There's a slight problem with this setup, though. In a normal binomial heap, the trees are always stored in order. If we just keep throwing new trees into our collection of trees, coalescing them at random times, and adding even more trees after that, there's no guarantee that the trees will be in any order. Therefore, we're going to need a new algorithm to merge those trees together.

The intuition behind this algorithm is the following. Suppose we create a hash table that maps from tree orders to trees. We could then do the following operation for each tree in the data structure:

Look up and see if there's already a tree of that order.
If not, insert the current tree into the hash table.
Otherwise:
- Merge the current tree with the tree of that order, removing the old tree from the hash table.
- Recursively repeat this process.

This operation ensures that when we're done, there's at most one tree of each order. It's also relatively efficient. Suppose that we start with T total trees and end up with t total trees. The number of total merges we'll end up doing will be T - t - 1, and each time we do a merge it will take time O(1) to do it. Therefore, the runtime for this operation will be linear in the number of trees (each tree is visited at least once) plus the number of merges done.

If the number of trees is small (say, Θ(log n)), then this operation will only take time O(log n). If the number of trees is large (say, Θ(n)), then this operation will take Θ(n) time, but will leave only Θ(log n) trees remaining, making future dequeues much faster.

We can quantify just how much better things will get by doing an amortized analysis and using a potential function. Let Φ to be our potential function and let Φ be the number of trees in the data structure. This means that the costs of the operations are as follows:

Insert: Does O(1) work and increases the potential by one. Amortized cost is O(1).
Merge: Does O(1) work. The potential of one heap is dropped to 0 and the other heap's potential is increased by a corresponding amount, so there is no net change in potential. The amortized cost is thus O(1).
Dequeue-Min: Does O(#trees + #merges) work and decreases the potential down to Θ(log n), the number of trees we'd have in the binomial tree if we were eagerly merging the trees together. We can account for this in a different way. Let's have the number of trees be written as Θ(log n) + E, where E is the "excess" number of trees. In that case, the total work done is Θ(log n + E + #merges). Notice that we'll do one merge per excess tree, and so the total work done is Θ(log n + E). Since our potential drops the number of trees from Θ(log n) + E down to Θ(log n), the drop in potential is -E. Therefore, the amortized cost of a dequeue-min is Θ(log n).

Another intuitive way to see why the amortized cost of a dequeue-min is Θ(log n) is by looking at why we have surplus trees. These extra trees are there because those darned greedy inserts are making all these extra trees and not paying for them! We can therefore "backcharge" the cost associated with doing all the merges back to the individual insertions that took up all that time, leaving behind the Θ(log n) "core" operation and a bunch of other operations that we'll blame on the insertions.

Therefore:

Modification 3: On a dequeue-min operation, consolidate all trees to ensure there's at most one tree of each order.

At this point, we have insert and merge running in time O(1) and dequeues running in amortized time O(log n). That's pretty nifty! However, we still don't have decrease-key working yet. That's going to be the challenging part.

Step Two: Implementing Decrease-Key

Right now, we have a "lazy binomial heap" rather than a Fibonacci heap. The real change between a binomial heap and a Fibonacci heap is how we implement decrease-key.

Recall that the decrease-key operation should take an entry already in the heap (usually, we'd have a pointer to it) and a new priority that's lower than the existing priority. It then changes the priority of that element to the new, lower priority.

We can implement this operation very quickly (in time O(log n)) using a straightforward algorithm. Take the element whose key should be decreased (which can be located in O(1) time; remember, we're assuming we have a pointer to it) and lower its priority. Then, repeatedly swap it with its parent node as long as its priority is lower than its parent, stopping when the node comes to rest or when it reaches the root of the tree. This operation takes time O(log n) because each tree has height at most O(log n) and each comparison takes time O(1).

Remember, though, that we're trying to do even better than this - we want the runtime to be O(1)! That's a very tough bound to match. We can't use any process that will move the node up or down the tree, since those trees have heights that can be Ω(log n). We'll have to try something more drastic.

Suppose that we want to decrease the key of a node. The only way that the heap property will be violated is if the node's new priority is lower than that of its parent. If we look at the subtree rooted at that particular node, it will still obey the heap property. So here's a totally crazy idea: what if whenever we decrease the key of a node, we cut the link to the parent node, then bring the entire subtree rooted at the node back up to the top level of the tree?

Modification 4: Have decrease-key decrease the key of a node and, if its priority is smaller than its parent's priority, cut it and add it to the root list.

What will the effect of this operation be? Several things will happen.

The node that previously had our node as a child now thinks it has the wrong number of children. Recall that a binomial tree of order n is defined to have n children, but that's not true any more.
The collection of trees in the root list will go up, increasing the cost of future dequeue operations.
The trees in our heap aren't necessarily going to be binomial trees any more. They might be "formerly" binomial trees that lost children at various points in time.

Number (1) isn't too much of a problem. If we cut a node from its parent, we can just decrease the order of that node by one to indicate that it has fewer children than it thought it previously did. Number (2) also isn't a problem. We can just backcharge the extra work done in the next dequeue-min operation to the decrease-key operation.

Number (3) is a very, very serious issue that we will need to address. Here's the problem: the efficiency of a binomial heap partially stems from the fact that any collection of n nodes can be stored in a collection of Θ(log n) trees of different order. The reason for this is that each binomial tree has 2ⁿ nodes in it. If we can start cutting nodes out of trees, then we risk having trees that have a large number of children (that is, a high order) but which don't have many nodes in them. For example, suppose we start with a single tree of order k and then perform decrease-key operations on all the grandchildren of k. This leaves k as a tree with order k, but which only contains k + 1 total nodes. If we keep repeating this process everywhere, we might end up with a bunch of trees of various orders that have a very small number of children in them. Consequently, when we do our coalesce operation to group the trees together, we might not reduce the number of trees to a manageable level, breaking the Θ(log n)-time bound that we really don't want to lose.

At this point, we're in a bit of a bind. We need to have a lot of flexibility with how the trees can be reshaped so that we can get the O(1) time decrease-key functionality, but we can't let the trees get reshaped arbitrarily or we will end up with decrease-key's amortized runtime increasing to something greater than O(log n).

The insight needed - and, quite honestly, what I think is the real genius in the Fibonacci heap - is a compromise between the two. The idea is the following. If we cut a tree from its parent, we're already planning on decreasing the rank of the parent node by one. The problem really arises when a node loses a lot of children, in which case its rank decreases significantly without any nodes higher up in the tree knowing about it. Therefore, we will say that each node is only allowed to lose one child. If a node loses a second child, then we'll cut that node from its parent, which propagates the information that nodes are missing higher up in the tree.

It turns out that this is a great compromise. It lets us do decrease-keys quickly in most contexts (as long as the nodes aren't children of the same tree), and only rarely do we have to "propagate" a decrease-key by cutting a node from its parent and then cutting that node from its grandparent.

To keep track of which nodes have lost children, we'll assign a "mark" bit to each node. Each node will initial have the mark bit cleared, but whenever it loses a child it will have the bit set. If it loses a second child after the bit has already been set, we'll clear the bit, then cut the node from its parent.

Modification 5: Assign a mark bit to each node that is initially false. When a child is cut from an unmarked parent, mark the parent. When a child is cut from a marked parent, unmark the parent and cut the parent from its parent.

In this CS Theory Stack Exchange question and this older Stack Overflow question, I've sketched out a proof that shows that if trees are allowed to be modified in this way, then any tree of order n must contain at least Θ(φⁿ) nodes, where φ is the golden ratio, about 1.61. This means that the number of nodes in each tree is still exponential in the order of the tree, though it's a lower exponent from before. As a result, the analysis we did earlier about the time complexity of the decrease-key operation still holds, though the term hidden in the Θ(log n) bit will be different.

There's one very last thing to consider - what about the complexity of decrease-key? Previously, it was O(1) because we just cut the tree rooted at the appropriate node and moved it to the root list. However, now we might have to do a "cascading cut," in which we cut a node from its parent, then cut that node from its parent, etc. How does that give O(1) time decrease-keys?

The observation here is that we can add a "charge" to each decrease-key operation that we can then spend to cut the parent node from its parent. Since we only cut a node from its parent if it's already lost two children, we can pretend that each decrease-key operation pays for the work necessary to cut its parent node. When we do cut the parent, we can charge the cost of doing so back to one of the earlier decrease-key operations. Consequently, even though any individual decrease-key operation might take a long time to finish, we can always amortize the work across the earlier calls so that the runtime is amortized O(1).

Step Three: Linked Lists Abound!

There is one final detail we have to talk about. The data structure I've described so far is tricky, but it doesn't seem catastrophically complicated. Fibonacci heaps have a reputation for being fearsome... why is that?

The reason is that in order to implement all of the operations described above, the tree structures need to be implemented in very clever ways.

Typically, you'd represent a multiway tree either by having each parent point down to all the children (perhaps by having an array of children) or by using the left-child/right-sibling representation, where the parent has a pointer to one child, which in turn points to a list of the other children. For a binomial heap, this is perfect. The main operation we need to do on trees is a join operation in which we make one root node a child of another, so it's perfectly reasonable to the pointers in the tree directed from parents to children.

The problem in a Fibonacci heap is that this representation is inefficient when considering the decrease-key step. Fibonacci heaps need to support all the basic pointer manipulations of a standard binomial heap and the ability to cut a single child from a parent.

Consider the standard representations of multiway trees. If we represent the tree by having each parent node store an array or list of pointers to its children, then we can't efficiently (in O(1)) remove a child node from the list of children. In other words, the runtime for decrease-key would be dominated by the bookkeeping step of removing the child rather than the logical step of moving a subtree to the root list! The same issue appears in the left-child, right-sibling representation.

The solution to this problem is to store the tree in a bizarre fashion. Each parent node stores a pointer to a single (and arbitrary) one of its children. The children are then stored in a circularly-linked list, and each points back up to its parent. Since it's possible to concatenate two circularly-linked lists in O(1) time and to insert or remove a single entry from one in O(1) time, this makes it possible to efficiently support the necessary tree operations:

Make one tree a child of another: if the first tree has no children, set its child pointer to point to the second tree. Otherwise, splice the second tree into the circularly-linked child list of the first tree.
Remove a child from a tree: splice that child node out of the linked list of children for the parent node. If it's the single node chosen to represent the children of the parent node, choose one of the sibling nodes to replace it (or set the pointer to null if it's the last child.)

There are absurdly many cases to consider and check when performing all these operations simply due to the number of different edge cases that can arise. The overhead associated with all the pointer juggling is one of the reasons why Fibonacci heaps are slower in practice than their asymptotic complexity might suggest (the other big one is the logic for removing the minimum value, which requires an auxiliary data structure).

Modification 6: Use a custom representation of the tree that supports efficient joining of trees and cutting one tree from another.

Conclusion

I hope this answer sheds some light on the mystery that is the Fibonacci heap. I hope that you can see the logical progression from a simpler structure (the binomial heap) to a more complex structure by a series of simple steps based on reasonable insights. It's not unreasonable to want to make insertions amortized-efficient at the expense of deletions, and it's similarly not too crazy to implement decrease-key by cutting out subtrees. From there, the rest of the details are in ensuring that the structure is still efficient, but they're more consequences of the other parts rather than causes.

If you're interested in learning more about Fibonacci heaps, you may want to check out this two-part series of lecture slides. Part one introduces binomial heaps and shows how lazy binomial heaps work. Part two explores Fibonacci heaps. These slides go into more mathematical depth than what I've covered here.

Hope this helps!