Red-Black Trees

I've seen binary trees and binary searching mentioned in several books I've read lately, but as I'm still at the beginning of my studies in Computer Science, I've yet to take a class that's really dealt with algorithms and data structures in a serious way.

I've checked around the typical sources (Wikipedia, Google) and most descriptions of the usefulness and implementation of (in particular) Red-Black trees have come off as dense and difficult to understand. I'm sure for someone with the necessary background, it makes perfect sense, but at the moment it reads like a foreign language almost.

So what makes binary trees useful in some of the common tasks you find yourself doing while programming? Beyond that, which trees do you prefer to use (please include a sample implementation) and why?

Red Black trees are good for creating well-balanced trees. The major problem with binary search trees is that you can make them unbalanced very easily. Imagine your first number is a 15. Then all the numbers after that are increasingly smaller than 15. You'll have a tree that is very heavy on the left side and has nothing on the right side.

Red Black trees solve that by forcing your tree to be balanced whenever you insert or delete. It accomplishes this through a series of rotations between ancestor nodes and child nodes. The algorithm is actually pretty straightforward, although it is a bit long. I'd suggest picking up the CLRS (Cormen, Lieserson, Rivest and Stein) textbook, "Introduction to Algorithms" and reading up on RB Trees.

The implementation is also not really so short so it's probably not really best to include it here. Nevertheless, trees are used extensively for high performance apps that need access to lots of data. They provide a very efficient way of finding nodes, with a relatively small overhead of insertion/deletion. Again, I'd suggest looking at CLRS to read up on how they're used.

While BSTs may not be used explicitly - one example of the use of trees in general are in almost every single modern RDBMS. Similarly, your file system is almost certainly represented as some sort of tree structure, and files are likewise indexed that way. Trees power Google. Trees power just about every website on the internet.

I'd like to address only the question "So what makes binary trees useful in some of the common tasks you find yourself doing while programming?"

This is a big topic that many people disagree on. Some say that the algorithms taught in a CS degree such as binary search trees and directed graphs are not used in day-to-day programming and are therefore irrelevant. Others disagree, saying that these algorithms and data structures are the foundation for all of our programming and it is essential to understand them, even if you never have to write one for yourself. This filters into conversations about good interviewing and hiring practices. For example, Steve Yegge has an article on interviewing at Google that addresses this question. Remember this debate; experienced people disagree.

In typical business programming you may not need to create binary trees or even trees very often at all. However, you will use many classes which internally operate using trees. Many of the core organization classes in every language use trees and hashes to store and access data.

If you are involved in more high-performance endeavors or situations that are somewhat outside the norm of business programming, you will find trees to be an immediate friend. As another poster said, trees are core data structures for databases and indexes of all kinds. They are useful in data mining and visualization, advanced graphics (2d and 3d), and a host of other computational problems.

I have used binary trees in the form of BSP (binary space partitioning) trees in 3d graphics. I am currently looking at trees again to sort large amounts of geocoded data and other data for information visualization in Flash/Flex applications. Whenever you are pushing the boundary of the hardware or you want to run on lower hardware specifications, understanding and selecting the best algorithm can make the difference between failure and success.

None of the answers mention what it is exactly BSTs are good for.

If what you want to do is just lookup by values then a hashtable is much faster, O(1) insert and lookup (amortized best case).

A BST will be O(log N) lookup where N is the number of nodes in the tree, inserts are also O(log N).

RB and AVL trees are important like another answer mentioned because of this property, if a plain BST is created with in-order values then the tree will be as high as the number of values inserted, this is bad for lookup performance.

The difference between RB and AVL trees are in the the rotations required to rebalance after an insert or delete, AVL trees are O(log N) for rebalances while RB trees are O(1). An example of benefit of this constant complexity is in a case where you might be keeping a persistent data source, if you need to track changes to roll-back you would have to track O(log N) possible changes with an AVL tree.

Why would you be willing to pay for the cost of a tree over a hash table? ORDER! Hash tables have no order, BSTs on the other hand are always naturally ordered by virtue of their structure. So if you find yourself throwing a bunch of data in an array or other container and then sorting it later, a BST may be a better solution.

The tree's order property gives you a number of ordered iteration capabilities, in-order, depth-first, breadth-first, pre-order, post-order. These iteration algorithms are useful in different circumstances if you want to look them up.

Red black trees are used internally in almost every ordered container of language libraries, C++ Set and Map, .NET SortedDictionary, Java TreeSet, etc...

So trees are very useful, and you may use them quite often without even knowing it. You most likely will never need to write one yourself, though I would highly recommend it as an interesting programming exercise.

Red Black Trees and B-trees are used in all sorts of persistent storage; because the trees are balanced the performance of breadth and depth traversals are mitigated.

Nearly all modern database systems use trees for data storage.

BSTs make the world go round, as said by Micheal. If you're looking for a good tree to implement, take a look at AVL trees (Wikipedia). They have a balancing condition, so they are guaranteed to be O(logn). This kind of searching efficiency makes it logical to put into any kind of indexing process. The only thing that would be more efficient would be a hashing function, but those get ugly quick, fast, and in a hurry. Also, you run into the Birthday Paradox (also known as the pigeon-hole problem).

What textbook are you using? We used Data Structures and Analysis in Java by Mark Allen Weiss. I actually have it open in my lap as i'm typing this. It has a great section about Red-Black trees, and even includes the code necessary to implement all the trees it talks about.

Red-Black Trees

Related

Recent Posts