How to determine if binary tree is balanced?
It's been a while from those school years. Got a job as IT specialist at a hospital. Trying to move to do some actual programming now. I'm working on binary trees now, and I was wondering what would be the best way to determine if the tree is height-balanced.
I was thinking of something along this:
public boolean isBalanced(Node root){
if(root==null){
return true; //tree is empty
}
else{
int lh = root.left.height();
int rh = root.right.height();
if(lh - rh > 1 || rh - lh > 1){
return false;
}
}
return true;
}
Is this a good implementation? or am I missing something?
Stumbled across this old question while searching for something else. I notice that you never did get a complete answer.
The way to solve this problem is to start by writing a specification for the function you are trying to write.
Specification: A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree.
Now that you have the specification, the code is trivial to write. Just follow the specification:
IsHeightBalanced(tree)
return (tree is empty) or
(IsHeightBalanced(tree.left) and
IsHeightBalanced(tree.right) and
abs(Height(tree.left) - Height(tree.right)) <= 1)
Translating that into the programming language of your choice should be trivial.
Bonus exercise: this naive code sketch traverses the tree far too many times when computing the heights. Can you make it more efficient?
Super bonus exercise: suppose the tree is massively unbalanced. Like, a million nodes deep on one side and three deep on the other. Is there a scenario in which this algorithm blows the stack? Can you fix the implementation so that it never blows the stack, even when given a massively unbalanced tree?
UPDATE: Donal Fellows points out in his answer that there are different definitions of 'balanced' that one could choose. For example, one could take a stricter definition of "height balanced", and require that the path length to the nearest empty child is within one of the path to the farthest empty child. My definition is less strict than that, and therefore admits more trees.
One can also be less strict than my definition; one could say that a balanced tree is one in which the maximum path length to an empty tree on each branch differs by no more than two, or three, or some other constant. Or that the maximum path length is some fraction of the minimum path length, like a half or a quarter.
It really doesn't matter usually. The point of any tree-balancing algorithm is to ensure that you do not wind up in the situation where you have a million nodes on one side and three on the other. Donal's definition is fine in theory, but in practice it is a pain coming up with a tree-balancing algorithm that meets that level of strictness. The performance savings usually does not justify the implementation cost. You spend a lot of time doing unnecessary tree rearrangements in order to attain a level of balance that in practice makes little difference. Who cares if sometimes it takes forty branches to get to the farthest leaf in a million-node imperfectly-balanced tree when it could in theory take only twenty in a perfectly balanced tree? The point is that it doesn't ever take a million. Getting from a worst case of a million down to a worst case of forty is usually good enough; you don't have to go all the way to the optimal case.
Balance is a truly subtle property; you think you know what it is, but it's so easy to get wrong. In particular, even Eric Lippert's (good) answer is off. That's because the notion of height is not enough. You need to have the concept of minimum and maximum heights of a tree (where the minimum height is the least number of steps from the root to a leaf, and the maximum is... well, you get the picture). Given that, we can define balance to be:
A tree where the maximum height of any branch is no more than one more than the minimum height of any branch.
(This actually implies that the branches are themselves balanced; you can pick the same branch for both maximum and minimum.)
All you need to do to verify this property is a simple tree traversal keeping track of the current depth. The first time you backtrack, that gives you a baseline depth. Each time after that when you backtrack, you compare the new depth against the baseline
- if it's equal to the baseline, then you just continue
- if it is more than one different, the tree isn't balanced
- if it is one off, then you now know the range for balance, and all subsequent depths (when you're about to backtrack) must be either the first or the second value.
In code:
class Tree {
Tree left, right;
static interface Observer {
public void before();
public void after();
public boolean end();
}
static boolean traverse(Tree t, Observer o) {
if (t == null) {
return o.end();
} else {
o.before();
try {
if (traverse(left, o))
return traverse(right, o);
return false;
} finally {
o.after();
}
}
}
boolean balanced() {
final Integer[] heights = new Integer[2];
return traverse(this, new Observer() {
int h;
public void before() { h++; }
public void after() { h--; }
public boolean end() {
if (heights[0] == null) {
heights[0] = h;
} else if (Math.abs(heights[0] - h) > 1) {
return false;
} else if (heights[0] != h) {
if (heights[1] == null) {
heights[1] = h;
} else if (heights[1] != h) {
return false;
}
}
return true;
}
});
}
}
I suppose you could do this without using the Observer pattern, but I find it easier to reason this way.
[EDIT]: Why you can't just take the height of each side. Consider this tree:
/\
/ \
/ \
/ \_____
/\ / \_
/ \ / / \
/\ C /\ / \
/ \ / \ /\ /\
A B D E F G H J
OK, a bit messy, but each side of the root is balanced: C
is depth 2, A
, B
, D
, E
are depth 3, and F
, G
, H
, J
are depth 4. The height of the left branch is 2 (remember the height decreases as you traverse the branch), the height of the right branch is 3. Yet the overall tree is not balanced as there is a difference in height of 2 between C
and F
. You need a minimax specification (though the actual algorithm can be less complex as there should be only two permitted heights).