How to find the kth largest element in an unsorted array of length n in O(n)?
Solution 1:
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n)
average time, O(n^2)
worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n)
worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n)
worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
Solution 2:
If you want a true O(n)
algorithm, as opposed to O(kn)
or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n
elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k
is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1
or A2
.
Let's guess that T(n) <= an
for some a
. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn
on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an
, we get roughly 2(1/n)(n/2)an = an
. But this is too big - there's no room to squeeze in an extra cn
. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2)
factors with the much cleaner (and smaller) n/4
. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c
.
This gives T(n) = O(n)
. It's clearly Omega(n)
, so we get T(n) = Theta(n)
.
Solution 3:
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
Solution 4:
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).