How do I find the median of numbers in linear time using heaps?

Wikipedia says:

Selection algorithms: Finding the min, max, both the min and max, median, or even the k-th largest element can be done in linear time using heaps.

All it says is that it can be done, and not how.

Can you give me some start on how this can be done using heaps?


You would use a min-max-median heap to find the min, max and median in constant time (and take linear time to build the heap). You can use order-statistics trees to find the kth smallest/largest value. Both of these data structures are described in this paper on min-max heaps [PDF]. Min-max heaps are binary heaps that alternate between min-heaps and max-heaps.

From the paper:

A min-max-median heap is a binary tree with the following properties:

  1. The median of all elements is located at the root

  2. The left subtree of the root is a min-max heap Hl of size ceiling[((n-1)/2)] containing elements less than or equal to the median. The right subtree is a max-min heap Hr of size floor[((n-1)/2)] containing only elements greater than or equal to the median.

The paper goes on to explain how to build such a heap.

Upon reading the paper more thoroughly it appears as though building the min-max-median heaps requires that you first find the median (FTA: "Find the median of all n elements using any one of the known linear-time algorithms"). That said, once you have built the heap you can maintain the median simply by maintaining the balance between the min-max heap on the left and the max-min heap on the right. DeleteMedian replaces the root with either the min of the max-min heap or the max of the min-max heap (whichever maintains the balance).

So if you plan on using a min-max-median heap to find the median of a fixed data set you're SOL but if you are using it on a changing data set it is possible.


See this wikipedia page on selection algorithms. In particular, look at the BFPRT algorithm and the Median of Medians algorithm. BFPRT is probabilistically linear, and is modelled on quicksort; Median of Medians is guaranteed linear, but has a large constant factor and so might take longer in practice, depending on the size of your dataset.

If you only have a few hundred or thousand elements from which to select the median, I suspect that a simple quicksort followed by direct indexing is easiest.