Determine the most common occurrence in an array

Assume I have an array of doubles that looks like the following:

Array[10] = {10, 10, 10, 3, 10, 10, 6, 10, 10, 9, 10}

I need a function that can determine what the MAJORTY vote is in the array, in this case "10" because it is the number that appears the most often... And of course there is the situation when no majority exists (where they are equal), in that case I need to throw an exception...

Any clues? Aside from doing some really nasty looping on the array (for each index, determine how many exist with the same value, store a count in the array, and then scan the count array for the highest number and the value at that position is the winner, etc...)


Solution 1:

Using a Map<Integer, Integer> should be simple as:

int mostFrequent(int... ary) {
    Map<Integer, Integer> m = new HashMap<Integer, Integer>();

    for (int a : ary) {
        Integer freq = m.get(a);
        m.put(a, (freq == null) ? 1 : freq + 1);
    }

    int max = -1;
    int mostFrequent = -1;

    for (Map.Entry<Integer, Integer> e : m.entrySet()) {
        if (e.getValue() > max) {
            mostFrequent = e.getKey();
            max = e.getValue();
        }
    }

    return mostFrequent;
}

Solution 2:

Your first problem is that you have an "array of doubles", because equality is problematic with floating point data (identical numerical values can be represented by different bit patters, among other things). If your doubles are in fact (as in the example) integers, use int instead. Otherweise, think long and hard about how you define what values are equal for the purpose of representing the same vote.

As for determining the majority vote, use a Map with the "vote id" as key and the number of votes as value - then in the end iterate through the map to find the maximal value.

Solution 3:

Sort the array first w/ quick sort and then scan and count for a majority - O(n ln n). If the range of elements is known ahead of time, say between {1,k}, then a counting sort can be used which will run in O(n+k).

As a slight improvement, as you are scanning the sorted array, if you find value that has more that n/2 occurrences you are done.

Solution 4:

With an array of doubles this might not be easy since equality comparisons on doubles are pretty problematic. If you can get away with using integers, you can do something like the following:

    HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
    for(int element: Array)
    {
        Integer frequency = map.get(element);
        map.put(element, (frequency != null) ? frequency + 1 : 1);      
    }
    int mostFrequentItem  = 0;
    int[] maxFrequencies  = new int[2];
    maxFrequencies[0]     = Integer.MIN_VALUE;

    for(Entry<Integer, Integer> entry: map.entrySet())
    {
        if(entry.getValue()>= maxFrequencies[0])
        {
            mostFrequentItem  = entry.getKey();
            maxFrequencies[1] = maxFrequencies[0];
            maxFrequencies[0] = entry.getValue();
        }
    }
    if(maxFrequencies[1] == maxFrequencies[0])
        throw new Exception();//insert whatever exception seems appropriate
            return mostFrequentItem  

This will have O(n) performance, so it should be pretty optimal in asymptotic performance behaviour. If your doubles are not the results of calculations but come from an other source, that is if you can be sure that values which are basically the same will be represented equally, you might get away with using the same method for doubles, however I would still recommend being careful that this is really the case.

Edit: some performance improvements as suggested in the comment as well as supporting checking for ambiguous case

Solution 5:

As @Grizzly points out, doubles are problematic from a computational standpoint. I would also suggest that they don't make sense from the standpoint of your problem domain; doubles don't make any sense with majority voting!

So lets assume that 10 and 6 and so on are integer identifiers for the things people are voting for. Lets also assume that you know that users can vote any value from 0 to 10.

int[] votes = ...
int[] voteCounts = new int[11];  // 11 could be calculated ...
for (int vote : votes) {
    voteCounts[vote]++;
}
int majority = (votes.length + 1) / 2;
for (int i = 0; i < voteCounts.length; i++) {
    if (voteCounts[i] >= majority) {
        return i;  // the winner!
    }
}
throw new NoClearMajorityException(...);

This algorithm is O(N) in time and O(M) in space, where M is the largest identifier. The catch is that it only works (as written) if the identifiers are integers.