How to find all values that occur more than n/k times in an array?
Finding the n/10th largest key (that is, the key that would be at position n/10 if the array was sorted) takes linear time using QuickSelect. If there are less than n/10 copies of this key, then you know that there are not n/10 copies of anything above it in sorted order, because there isn't room for n/10 copies of anything above the key in sorted order. If there are n/10 or more copies, then you have found something that occurs more than n/10 times, and again there can't be anything larger than it that occurs more than n/10 times, because there isn't room for it.
Now you have an array of at most 9n/10 values smaller than the key you have just found left over from QuickSelect. Use another pass of QuickSelect to find the key n/10 from the top of this left over array. As before, you may find a key that occurs n/10 or more times, and whether you do or not you will eliminate at least n/10 entries from the array.
So you can search the whole array with 10 calls of QuickSelect, each taking linear time. 10 is a number fixed in the problem definition, so the whole operation counts as only linear time.
There is a variation of Boyer-Moore Voting algorithm which can find all the elements that occurs more than n/k
in a input which runs in O(nk)
and since k = 10
for your problem I think it should run in O(n * 10) = O(n)
time.
From here
Following is an interesting O(nk) solution: We can solve the above problem in O(nk) time using O(k-1) extra space. Note that there can never be more than k-1 elements in output (Why?). There are mainly three steps in this algorithm.
1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.
struct eleCount { int element; int count; }; struct eleCount temp[]; This step takes O(k) time.
2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.
3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).
Consider k = 4, n = 9 Given array: 3 1 2 2 2 1 4 3 3
i = 0
3 _ _ temp[] has one element, 3 with count 1
i = 1
3 1 _ temp[] has two elements, 3 and 1 with counts 1 and 1 respectively
i = 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 1 respectively.
i = 3
- - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.
i = 4
- - 2 - - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 3 respectively.
i = 5
- - 2 - 1 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 2 and 3 respectively.
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.
i = 6
- - 2 - 1 2 temp[] has two elements, 1 and 2 with counts as 1 and 2 respectively.
i = 7
- 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.
i = 8
3 - 2 3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 2, 1 and 2 respectively.
Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.
For a proper proof of this approach check out this answer from cs.stackexchange
You are right.
Decimal dominant is among the next candidates after QuickSelect: n/10, 2*n/10..9*n/10, so checking only n/10 index is not sufficient
Note that dominant occupies long run in sorted array and certainly at least one of elements with mentioned indexes belongs to that run.
Example for k = 3, N = 11. Let element b occupies at least 1/3 of array. In this case sorted array might look like
b b b b * * * * * * *
* b b b b * * * * * *
* * b b b b * * * * *
* * * b b b b * * * *
* * * * b b b b * * *
* * * * * b b b b * *
* * * * * b b b b * *
* * * * * * b b b b *
* * * * * * * b b b b
^ ^ //positions for quickselect
Note that in any case dominant element (if k-dominant does exist) occupies at least one of marked places. So after two rounds of QuickSelect we have two candidates