Lower bound for finding second largest element

In a recent discussion, I came across the idea of proving a lower bound for the number of comparisons required to find the largest element in an array. The bound is $n - 1$. This is so because the set of comparisons performed by every such algorithm looks like a tournament tree, which always has $n - 1$ internal nodes.

The obvious question is then, What should be the lower bound for finding the 2nd largest element in an array.


Solution 1:

The (tight) lower bound is $n + \lceil \lg n \rceil - 2$ (where $\lg n$ means $\log_2 n$).

I'll prove tightness first: that this can be achieved (apparently the idea is due to Lewis Carroll!). First find the maximum using a "tennis tournament" structure: first compare the $n$ elements in pairs, then compare the $n/2$ "winners" in pairs, and so on. (Unpaired elements get a bye to the next round.) Since every element except the winner loses exactly once, this takes $n-1$ comparisons. But now note that the second largest element must be one which lost to the winner, as it couldn't have been defeated by any other element. So you need to find the maximum among all the (up to) $\lceil \lg n \rceil$ elements that were defeated by the winner, and finding this maximum can be done in $\lceil \lg n \rceil - 1$.

We can prove this is a lower bound as well. Let the number of elements that lost to the maximum be $m$.

  • Firstly, you need to find the maximum, since one cannot be sure some element is the second maximum without knowing which element is the maximum. Further, for each element that lost to the maximum ($m$ of them), this comparison was useless in determining whether it was or not the second maximum. In other words, all elements other than the maximum must lose at least once, and all but one ($m-1$) of the elements that directly lost to the maximum must lose at least once more, no matter in which order the comparisons were made. So we need at least $n-1 + m-1 = n + m - 2$ comparisons.
  • To state the same thing differently: $n-2$ of the elements must be found less than the second-largest element — comparisons with the largest element do not help here — plus $m$ of them must lose directly to the maximum by definition, so we need at least $n - 2 + m = n + m - 2$ comparisons.

This proves the $n + \lceil \lg n\rceil - 2$ lower bound if we show that $m \ge \lceil \lg n \rceil$, i.e., an adversary can make sure you always have at least $\lceil \lg n\rceil$ elements that lost to the winner. This is proved here or (with a bug pointed out by @noedne below) here; an argument goes as follows: the adversary keeps for each element a “weight”, which is either $0$ if the element has ever lost a comparison (and is therefore known not to be the maximum element) or else the number of elements known to be less than or equal to it. Whenever a query's result is already “known”, the adversary gives that answer, else it declares the one which has a larger weight the winner (breaking ties arbitrarily). The maximum is determined when some weight grows to $n$, and with each comparison any weight can only double, so the number of comparisons needs to be at least $\lceil \lg n\rceil$.

Solution 2:

The definitive answer for any $k$ (not just the second largest) is discussed in these notes. In short, if $V(n,k)$ is the number of comparisons needed to determine the kth largest element in a set of size $n$, then $$ V(n,k) \ge n + R(n,k) - 2\sqrt{R(n,k)} $$ where $$ R(n,k) = \log \binom{n}{k} - \log (n-k+1) + 3$$