The best way to calculate the height in a binary search tree? (balancing an AVL-tree)
I'm looking for the best way to calculate a nodes balance in an AVL-tree. I thought I had it working, but after some heavy inserting/updating I can see that it's not working correct (at all).
This is kind of a two-part question, the first part would be how to calculate the height of a sub-tree, I know the definition "The height of a node is the length of the longest downward path to a leaf from that node." and I understand it, but I fail at implementing it. And to confuse me further this quote can be found on wikipedia on tree-heights "Conventionally, the value -1 corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one node."
And the second part is getting the balance factor of a sub-tree in an AVL tree, I've got no problem understanding the concept, "get the height of your L
and R
sub-trees and subtract R
from L
". And this is defined as something like this: BALANCE = NODE[L][HEIGHT] - NODE[R][HEIGT]
Reading on wikipedia says this on the first few lines describing insertions into an AVL tree: "If the balance factor becomes -1, 0, or 1 then the tree is still in AVL form, and no rotations are necessary."
It then goes on, saying this "If the balance factor becomes 2 or -2 then the tree rooted at this node is unbalanced, and a tree rotation is needed. At most a single or double rotation will be needed to balance the tree." - which I have no trouble grasping.
But (yes, there's always a but).
Here's where it gets confusing, the text states "If the balance factor of R is 1, it means the insertion occurred on the (external) right side of that node and a left rotation is needed". But from m understanding the text said (as I quoted) that if the balance factor was within [-1, 1]
then there was no need for balancing?
I feel I'm so close to grasping the concept, I've gotten the tree rotations down, implemented a normal binary search tree, and on the brink of grasping AVL-trees but just seem to be missing that essential epiphany.
Edit: Code examples are preferred over academic formulas as I've always had an easier time grasping something in code, but any help is greatly appreciated.
Edit: I wish I could mark all answers as "accepted", but for me NIck's answer was the first that made me go "aha".
Part 1 - height
As starblue says, height is just recursive. In pseudo-code:
height(node) = max(height(node.L), height(node.R)) + 1
Now height could be defined in two ways. It could be the number of nodes in the path from the root to that node, or it could be the number of links. According to the page you referenced, the most common definition is for the number of links. In which case the complete pseudo code would be:
height(node):
if node == null:
return -1
else:
return max(height(node.L), height(node.R)) + 1
If you wanted the number of nodes the code would be:
height(node):
if node == null:
return 0
else:
return max(height(node.L), height(node.R)) + 1
Either way, the rebalancing algorithm I think should work the same.
However, your tree will be much more efficient (O(ln(n))) if you store and update height information in the tree, rather than calculating it each time. (O(n))
Part 2 - balancing
When it says "If the balance factor of R is 1", it is talking about the balance factor of the right branch, when the balance factor at the top is 2. It is telling you how to choose whether to do a single rotation or a double rotation. In (python like) Pseudo-code:
if balance factor(top) = 2: // right is imbalanced
if balance factor(R) = 1: //
do a left rotation
else if balance factor(R) = -1:
do a double rotation
else: // must be -2, left is imbalanced
if balance factor(L) = 1: //
do a left rotation
else if balance factor(L) = -1:
do a double rotation
I hope this makes sense
You do not need to calculate tree depths on the fly.
You can maintain them as you perform operations.
Furthermore, you don't actually in fact have to maintain track of depths; you can simply keep track of the difference between the left and right tree depths.
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_avl.aspx
Just keeping track of the balance factor (difference between left and right subtrees) is I found easier from a programming POV, except that sorting out the balance factor after a rotation is a PITA...
Height is easily implemented by recursion, take the maximum of the height of the subtrees plus one.
The "balance factor of R" refers to the right subtree of the tree which is out of balance, I suppose.
Here's an alternate way of finding height. Add an additional attribute to your node called height:
class Node
{
data value; //data is a custom data type
node right;
node left;
int height;
}
Now, we'll do a simple breadth-first traversal of the tree, and keep updating the height value for each node:
int height (Node root)
{
Queue<Node> q = Queue<Node>();
Node lastnode;
//reset height
root.height = 0;
q.Enqueue(root);
while(q.Count > 0)
{
lastnode = q.Dequeue();
if (lastnode.left != null){
lastnode.left.height = lastnode.height + 1;
q.Enqueue(lastnode.left);
}
if (lastnode.right != null){
lastnode.right.height = lastnode.height + 1;
q.Enqueue(lastnode.right);
}
}
return lastnode.height; //this will return a 0-based height, so just a root has a height of 0
}
Cheers,