Hash table vs Balanced binary tree [closed]

Solution 1:

This question cannot be answered, in general, I fear.

The issue is that there are many types of hash tables and balanced binary trees, and their performances vary widely.

So, the naive answer is: it depends on the functionality you need. Use a hash table if you do not need ordering and a balanced binary tree otherwise.

For a more elaborate answer, let's consider some alternatives.

Hash Table (see Wikipedia's entry for some basics)

  • Not all hash tables use a linked-list as a bucket. A popular alternative is to use a "better" bucket, for example a binary tree, or another hash table (with another hash function), ...
  • Some hash tables do not use buckets at all: see Open Addressing (they come with other issues, obviously)
  • There is something called Linear re-hashing (it's a quality of implementation detail), which avoids the "stop-the-world-and-rehash" pitfall. Basically during the migration phase you only insert in the "new" table, and also move one "old" entry into the "new" table. Of course, migration phase means double look-up etc...

Binary Tree

  • Re-balancing is costly, you may consider a Skip-List (also better for multi-threaded accesses) or a Splay Tree.
  • A good allocator can "pack" nodes together in memory (better caching behavior), even though this does not alleviate the pointer-look-up issue.
  • B-Tree and variants also offer "packing"

Let's not forget that O(1) is an asymptotic complexity. For few elements, the coefficient is usually more important (performance-wise). Which is especially true if your hash function is slow...

Finally, for sets, you may also wish to consider probabilistic data structures, like Bloom Filters.

Solution 2:

Hash tables are generally better if there isn't any need to keep the data in any sort of sequence. Binary trees are better if the data must be kept sorted.

Solution 3:

A worthy point on a modern architecture: A Hash table will usually, if its load factor is low, have fewer memory reads than a binary tree will. Since memory access tend to be rather costly compared to burning CPU cycles, the Hash table is often faster.

In the following Binary tree is assumed to be self-balancing, like a red black tree, an AVL tree or like a treap.

On the other hand, if you need to rehash everything in the hash table when you decide to extend it, this may be a costly operation which occur (amortized). Binary trees does not have this limitation.

Binary trees are easier to implement in purely functional languages.

Binary trees have a natural sort order and a natural way to walk the tree for all elements.

When the load factor in the hash table is low, you may be wasting a lot of memory space, but with two pointers, binary trees tend to take up more space.

Hash tables are nearly O(1) (depending on how you handle the load factor) vs. Bin trees O(lg n).

Trees tend to be the "average performer". There are nothing they do particularly well, but then nothing they do particularly bad.