How to find the lowest common ancestor of two nodes in any binary tree?

The Binary Tree here is may not necessarily be a Binary Search Tree.
The structure could be taken as -

struct node {
    int data;
    struct node *left;
    struct node *right;
};

The maximum solution I could work out with a friend was something of this sort -
Consider this binary tree :

Binary Tree

The inorder traversal yields - 8, 4, 9, 2, 5, 1, 6, 3, 7

And the postorder traversal yields - 8, 9, 4, 5, 2, 6, 7, 3, 1

So for instance, if we want to find the common ancestor of nodes 8 and 5, then we make a list of all the nodes which are between 8 and 5 in the inorder tree traversal, which in this case happens to be [4, 9, 2]. Then we check which node in this list appears last in the postorder traversal, which is 2. Hence the common ancestor for 8 and 5 is 2.

The complexity for this algorithm, I believe is O(n) (O(n) for inorder/postorder traversals, the rest of the steps again being O(n) since they are nothing more than simple iterations in arrays). But there is a strong chance that this is wrong. :-)

But this is a very crude approach, and I'm not sure if it breaks down for some case. Is there any other (possibly more optimal) solution to this problem?


Starting from root node and moving downwards if you find any node that has either p or q as its direct child then it is the LCA. (edit - this should be if p or q is the node's value, return it. Otherwise it will fail when one of p or q is a direct child of the other.)

Else if you find a node with p in its right(or left) subtree and q in its left(or right) subtree then it is the LCA.

The fixed code looks like:

treeNodePtr findLCA(treeNodePtr root, treeNodePtr p, treeNodePtr q) {

        // no root no LCA.
        if(!root) {
                return NULL;
        }

        // if either p or q is the root then root is LCA.
        if(root==p || root==q) {
                return root;
        } else {
                // get LCA of p and q in left subtree.
                treeNodePtr l=findLCA(root->left , p , q);

                // get LCA of p and q in right subtree.
                treeNodePtr r=findLCA(root->right , p, q);

                // if one of p or q is in leftsubtree and other is in right
                // then root it the LCA.
                if(l && r) {
                        return root;
                }
                // else if l is not null, l is LCA.
                else if(l) {
                        return l;
                } else {
                        return r;
                }
        }
}

The below code fails when either is the direct child of other.

treeNodePtr findLCA(treeNodePtr root, treeNodePtr p, treeNodePtr q) {

        // no root no LCA.
        if(!root) {
                return NULL;
        }

        // if either p or q is direct child of root then root is LCA.
        if(root->left==p || root->left==q || 
           root->right ==p || root->right ==q) {
                return root;
        } else {
                // get LCA of p and q in left subtree.
                treeNodePtr l=findLCA(root->left , p , q);

                // get LCA of p and q in right subtree.
                treeNodePtr r=findLCA(root->right , p, q);

                // if one of p or q is in leftsubtree and other is in right
                // then root it the LCA.
                if(l && r) {
                        return root;
                }
                // else if l is not null, l is LCA.
                else if(l) {
                        return l;
                } else {
                        return r;
                }
        }
}

Code In Action


Nick Johnson is correct that a an O(n) time complexity algorithm is the best you can do if you have no parent pointers.) For a simple recursive version of that algorithm see the code in Kinding's post which runs in O(n) time.

But keep in mind that if your nodes have parent pointers, an improved algorithm is possible. For both nodes in question construct a list containing the path from root to the node by starting at the node, and front inserting the parent.

So for 8 in your example, you get (showing steps): {4}, {2, 4}, {1, 2, 4}

Do the same for your other node in question, resulting in (steps not shown): {1, 2}

Now compare the two lists you made looking for the first element where the list differ, or the last element of one of the lists, whichever comes first.

This algorithm requires O(h) time where h is the height of the tree. In the worst case O(h) is equivalent to O(n), but if the tree is balanced, that is only O(log(n)). It also requires O(h) space. An improved version is possible that uses only constant space, with code shown in CEGRD's post


Regardless of how the tree is constructed, if this will be an operation you perform many times on the tree without changing it in between, there are other algorithms you can use that require O(n) [linear] time preparation, but then finding any pair takes only O(1) [constant] time. For references to these algorithms, see the the lowest common ancestor problem page on Wikipedia. (Credit to Jason for originally posting this link)