What additional rotation is required for deletion from a Top-Down 2-3-4 Left-leaning Red Black tree?

I have been implementing an LLRB package that should be able to operate in either of the two modes, Bottom-Up 2-3 or Top-Down 2-3-4 described by Sedgewick (code - improved code, though dealing only with 2-3 trees here, thanks to RS for pointer).

Sedgewick provides a very clear description of tree operations for the 2-3 mode, although he spends a lot of time talking about the 2-3-4 mode. He also shows how a simple alteration of the order of color flipping during insertion can alter the behaviour of the tree (either split on the way down for 2-3-4 or split on the way up for 2-3):

private Node insert(Node h, Key key, Value value)
{
    if (h == null)
        return new Node(key, value);

    // Include this for 2-3-4 trees
    if (isRed(h.left) && isRed(h.right)) colorFlip(h);

    int cmp = key.compareTo(h.key);

    if (cmp == 0)     h.val = value;
    else if (cmp < 0) h.left = insert(h.left, key, value);
    else              h.right = insert(h.right, key, value);

    if (isRed(h.right) && !isRed(h.left))    h = rotateLeft(h);
    if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);

    // Include this for 2-3 trees
    if (isRed(h.left) && isRed(h.right)) colorFlip(h);

    return h;
}

However, he glosses over deletion in 2-3-4 LLRBs with the following:

The code on the next page is a full implementation of delete() for LLRB 2-3 trees. It is based on the reverse of the approach used for insert in top-down 2-3-4 trees: we perform rotations and color flips on the way down the search path to ensure that the search does not end on a 2-node, so that we can just delete the node at the bottom. We use the method fixUp() to share the code for the color flip and rotations following the recursive calls in the insert() code. With fixUp(), we can leave right-leaning red links and unbalanced 4-nodes along the search path, secure that these conditions will be fixed on the way up the tree. (The approach is also effective 2-3-4 trees, but requires an extra rotation when the right node off the search path is a 4-node.)

His delete() function:

private Node delete(Node h, Key key)
{
    if (key.compareTo(h.key) < 0)
        {
            if (!isRed(h.left) && !isRed(h.left.left))
            h = moveRedLeft(h);
            h.left = delete(h.left, key);
        }
    else
        {
            if (isRed(h.left))
                h = rotateRight(h);
            if (key.compareTo(h.key) == 0 && (h.right == null))
                return null;
            if (!isRed(h.right) && !isRed(h.right.left))
                h = moveRedRight(h);
            if (key.compareTo(h.key) == 0)
                {
                    h.val = get(h.right, min(h.right).key);
                    h.key = min(h.right).key;
                    h.right = deleteMin(h.right);
                }
            else h.right = delete(h.right, key);
        }
    return fixUp(h);
}

My implementation correctly maintains LLRB 2-3 invariants for all tree operations on 2-3 trees, but fails for a subclass of right-sided deletions on 2-3-4 trees (these failing deletions result in right leaning red nodes, but snowball to tree imbalance and finally null pointer dereferencing). From a survey of example code that discusses LLRB trees and includes options for construction of trees in either mode, it seems that none correctly implements the deletion from a 2-3-4 LLRB (i.e. none has the extra rotation alluded to, e.g. Sedgewick's java above and here).

I'm having trouble figuring out exactly what he means by "extra rotation when the right node off the search path is a 4-node"; presumably this is a rotate left, but where and when?

If I rotate left passing upwards past a 4-node equivalent (i.e. RR node) or a right leaning 3-node equivalent (BR node) either before calling fixUp() or at the end of the fixUp function I still get the same invariant contradiction.

Here are the tree states of the smallest failing examples I have found (generated by sequential insertion of elements from 0 to the respective maximum value).

The first pair of trees shows the transition from invariant-conforming state prior to deletion of element 15 to obviously broken state after.

A 2-3-4 LLRB containing 0..15

State after deleting item 15

The second is essentially the same as above, but with deletion of 16 of 0..16 (deletion of 15 results in the same topology). Note that the invariant contradiction manages to cross the root node.

A 2-3-4 LLRB containing 0..16

State after deleting item 16

The key is going to be understanding how to revert the violations generated during the walk down the tree to the target node. The following two trees show how the first tree above looks after a walk down the left and right respectively (without deletion and before walking back up with fixUp()).

After attempt to delete '-1' without fixUp: After attempt to delete '-1' without fixUp

After attempt to delete '16' without fixUp: After attempt to delete '16' without fixUp

Trying rotate left on the walk back up when the node has only a red right child seems to be part of the solution, but it does not deal correctly with two red right children in a row, preceding this with a flipColor when both children are red seems to improve the situation further, but still leaves some invariants violated.

If I further check whether the right child of a right child is red when its sibling is black and rotate left if this is true I only fail once, but at this point I feel like I'm needing a new theory rather than a new epicycle.

Any ideas?

For reference, my implementation is available here (No, it's not Java).

Follow-up:

Part of the reason I was interested in this was to confirm the claim by many that 2-3 LLRB trees are more efficient than 2-3-4 LLRB trees. My benchmarking has confirmed this for insertion and deletion (2-3 are about 9% faster), but I find that retrieval is very slightly faster for 2-3-4 trees.

The following times are representative and consistent across runs:

BU23:
BenchmarkInsert        1000000        1546 ns/op
BenchmarkDelete        1000000        1974 ns/op
BenchmarkGet           5000000         770 ns/op

TD234:
BenchmarkInsert        1000000        1689 ns/op
BenchmarkDelete        1000000        2133 ns/op
BenchmarkGet           5000000         753 ns/op

First column is bench name, second is number of operations, third is result. Benchmark on i5M 2.27.

I have had a look at branch lengths for 2-3 tree and 2-3-4 trees and there is little in that to explain the retrieval difference (mean distance from root to node and S.D. of 1000 trees each with 10000 random inserts):

Means:
TD234 leafs  BU23 leafs 
 12.88940     12.84681 
TD234 all    BU23 all 
 11.79274     11.79163 

StdDev:
TD234 leafs  BU23 leafs 
 1.222458     1.257344 
TD234 all    BU23 all 
 1.874335     1.885204 

Solution 1:

Updated and verified

Of key importance to testing this is that the implementation doesn't support deleting a nonexistent or previously deleted node! I spent way too long trying to figure out why my working solution was "broken". This can be fixed by doing a preliminary search for the key and returning false if it's not in the tree at all, and that solution was employed in the linked code at the bottom.

It doesn't appear Sedgewick wrote a deletion for 2-3-4 deletion that is publicly available. His results specifically deal with 2-3 trees (he only makes cursory mention of 2-3-4 trees in that their average path length (and thus search cost), as well as that of other red-black trees, is indistinguishable from the 2-3 case). Nobody else seems to have one easily found, either, so here's what I found after debugging the problem:

To begin, take Sedgewick's code and fix the out of date bits. In the slides here (pg 31) you can see that his code still uses the old representation of 4 nodes where it was done by having two left reds in a row, rather than balance. The first bit to write a 2-3-4 deletion routine, then, is to fix this so that we can do a sanity check which will help us verify our fixes later:

private boolean is234(Node x)
{         
   if (x == null)
      return true;
   // Note the TD234 check is here because we also want this method to verify 2-3 trees
   if (isRed(x.right))
      return species == TD234 && isRed(x.left);

   if (!isRed(x.right))
      return true;

   return is234(x.left) && is234(x.right);
} 

Once we have this, we know a couple things. One, from the paper we see that 4 nodes should not be broken on the way up when using a 2-3-4 tree. Two, there's a special case for a right 4-node on the search path. There's a third special case that isn't mentioned, and that is when you are going back up a tree, you may end up where you have h.right.left be red, which would leave you invalid with just a rotate left. This is the mirror of the case described for insert on page 4 of the paper.

The rotation fix for a 4-node you need is as follows:

    private Node moveRedLeft(Node h)
    {  // Assuming that h is red and both h.left and h.left.left
       // are black, make h.left or one of its children red.
       colorFlip(h);
       if (isRed(h.right.left))
       { 
          h.right = rotateRight(h.right);

          h = rotateLeft(h);
          colorFlip(h);

          if (isRed(h.right.right) )
             h.right = rotateLeft(h.right);
       }
      return h;
    }

And this removes the splitting on 2-3-4, as well as adds the fix for the third special case

private Node fixUp(Node h)
{
   if (isRed(h.right))
   {      
      if (species == TD234 && isRed(h.right.left))
         h.right = rotateRight(h.right);
      h = rotateLeft(h);
   }

   if (isRed(h.left) && isRed(h.left.left))
      h = rotateRight(h);

   if (species == BU23 && isRed(h.left) && isRed(h.right))
      colorFlip(h);

   return setN(h);
}

Finally, we need to test this and make sure it works. They don't have to be the most efficient, but as I found during the debugging of this, they have to actually work with the expected tree behavior (i.e. not insert/delete duplicate data)! I did this with a test helper methods. The commented lines were there for when I was debugging, I'd break and check the tree for obvious imbalance. I've tried this method with 100000 nodes, and it performed flawlessly:

   public static boolean Test()
   {
      return Test(System.nanoTime());
   }
   public static boolean Test(long seed)
   {
      StdOut.println("Seeding test with: " + seed);
      Random r = new Random(seed);
      RedBlackBST<Integer, Integer> llrb = new RedBlackBST<Integer,Integer>(TD234);
      ArrayList<Integer> treeValues = new ArrayList<Integer>();
      for (int i = 0; i < 1000; i++)
      {
         int val = r.nextInt();
         if (!treeValues.contains(val))
         {
            treeValues.add(val);
            llrb.put(val, val);
         }
         else
            i--;
      }
      for (int i = 0; i < treeValues.size(); i++)
      {
         llrb.delete(treeValues.get(i));
         if (!llrb.check())
         {
            return false;
         }
//         StdDraw.clear(Color.GRAY);
//         llrb.draw(.95, .0025, .008);
      }
      return true;
   }

The complete source can be found here.