How to find all values that occur more than n/k times in an array?

Finding the n/10th largest key (that is, the key that would be at position n/10 if the array was sorted) takes linear time using QuickSelect. If there are less than n/10 copies of this key, then you know that there are not n/10 copies of anything above it in sorted order, because there isn't room for n/10 copies of anything above the key in sorted order. If there are n/10 or more copies, then you have found something that occurs more than n/10 times, and again there can't be anything larger than it that occurs more than n/10 times, because there isn't room for it.

Now you have an array of at most 9n/10 values smaller than the key you have just found left over from QuickSelect. Use another pass of QuickSelect to find the key n/10 from the top of this left over array. As before, you may find a key that occurs n/10 or more times, and whether you do or not you will eliminate at least n/10 entries from the array.

So you can search the whole array with 10 calls of QuickSelect, each taking linear time. 10 is a number fixed in the problem definition, so the whole operation counts as only linear time.


There is a variation of Boyer-Moore Voting algorithm which can find all the elements that occurs more than n/k in a input which runs in O(nk) and since k = 10 for your problem I think it should run in O(n * 10) = O(n) time.

From here

Following is an interesting O(nk) solution: We can solve the above problem in O(nk) time using O(k-1) extra space. Note that there can never be more than k-1 elements in output (Why?). There are mainly three steps in this algorithm.

1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.

 struct eleCount {
     int element;
     int count; };  
 struct eleCount temp[];  This step takes O(k) time.

2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

Consider k = 4, n = 9 Given array: 3 1 2 2 2 1 4 3 3

i = 0

     3 _ _ temp[] has one element, 3 with count 1

i = 1

     3 1 _ temp[] has two elements, 3 and 1 with  counts 1 and 1 respectively

i = 2

     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 1 respectively.

i = 3

     - - 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.

i = 4

     - - 2 
     - - 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 3 respectively.

i = 5

     - - 2 
     - 1 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 2 and 3 respectively.  

Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6

     - - 2 
     - 1 2  temp[] has two elements, 1 and 2 with counts as 1 and 2 respectively.

i = 7

       - 2 
     3 1 2  temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.

i = 8

     3 - 2
     3 1 2  temp[] has three elements, 3, 1 and 2 with counts as 2, 1 and 2 respectively. 

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

For a proper proof of this approach check out this answer from cs.stackexchange


You are right.

Decimal dominant is among the next candidates after QuickSelect: n/10, 2*n/10..9*n/10, so checking only n/10 index is not sufficient

Note that dominant occupies long run in sorted array and certainly at least one of elements with mentioned indexes belongs to that run.

Example for k = 3, N = 11. Let element b occupies at least 1/3 of array. In this case sorted array might look like

b b b b * * * * * * * 
* b b b b * * * * * * 
* * b b b b * * * * *     
* * * b b b b * * * *     
* * * * b b b b * * *
* * * * * b b b b * *
* * * * * b b b b * *
* * * * * * b b b b *
* * * * * * * b b b b
      ^       ^       //positions for quickselect

Note that in any case dominant element (if k-dominant does exist) occupies at least one of marked places. So after two rounds of QuickSelect we have two candidates