Find the most frequent number in a NumPy array

Suppose I have the following NumPy array:

a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])

How can I find the most frequent number in this array?

If your list contains all non-negative ints, you should take a look at numpy.bincounts:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

and then probably use np.argmax:

a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])
counts = np.bincount(a)
print(np.argmax(counts))

For a more complicated list (that perhaps contains negative numbers or non-integer values), you can use np.histogram in a similar way. Alternatively, if you just want to work in python without using numpy, collections.Counter is a good way of handling this sort of data.

from collections import Counter
a = [1,2,3,1,2,1,1,1,3,2,2,1]
b = Counter(a)
print(b.most_common(1))

You may use

values, counts = np.unique(a, return_counts=True)

ind = np.argmax(counts)
print(values[ind])  # prints the most frequent element

ind = np.argpartition(-counts, kth=10)[:10]
print(values[ind])  # prints the 10 most frequent elements

If some element is as frequent as another one, this code will return only the first element.

If you're willing to use SciPy:

>>> from scipy.stats import mode
>>> mode([1,2,3,1,2,1,1,1,3,2,2,1])
(array([ 1.]), array([ 6.]))
>>> most_frequent = mode([1,2,3,1,2,1,1,1,3,2,2,1])[0][0]
>>> most_frequent
1.0

Performances (using iPython) for some solutions found here:

>>> # small array
>>> a = [12,3,65,33,12,3,123,888000]
>>> 
>>> import collections
>>> collections.Counter(a).most_common()[0][0]
3
>>> %timeit collections.Counter(a).most_common()[0][0]
100000 loops, best of 3: 11.3 µs per loop
>>> 
>>> import numpy
>>> numpy.bincount(a).argmax()
3
>>> %timeit numpy.bincount(a).argmax()
100 loops, best of 3: 2.84 ms per loop
>>> 
>>> import scipy.stats
>>> scipy.stats.mode(a)[0][0]
3.0
>>> %timeit scipy.stats.mode(a)[0][0]
10000 loops, best of 3: 172 µs per loop
>>> 
>>> from collections import defaultdict
>>> def jjc(l):
...     d = defaultdict(int)
...     for i in a:
...         d[i] += 1
...     return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)[0]
... 
>>> jjc(a)[0]
3
>>> %timeit jjc(a)[0]
100000 loops, best of 3: 5.58 µs per loop
>>> 
>>> max(map(lambda val: (a.count(val), val), set(a)))[1]
12
>>> %timeit max(map(lambda val: (a.count(val), val), set(a)))[1]
100000 loops, best of 3: 4.11 µs per loop
>>>

Best is 'max' with 'set' for small arrays like the problem.

According to @David Sanders, if you increase the array size to something like 100,000 elements, the "max w/set" algorithm ends up being the worst by far whereas the "numpy bincount" method is the best.

Find the most frequent number in a NumPy array

Performances (using iPython) for some solutions found here:

Related

Recent Posts