return the top n most frequently occurring chars and their respective counts in python
how to return the top n most frequently occurring chars and their respective counts # e.g 'aaaaaabbbbcccc'
, 2
should return [('a', 6), ('b', 4)]
in python
I tried this
def top_chars(input, n):
list1=list(input)
list3=[]
list2=[]
list4=[]
set1=set(list1)
list2=list(set1)
def count(item):
count=0
for x in input:
if x in input:
count+=item.count(x)
list3.append(count)
return count
list2.sort(key=count)
list3.sort()
list4=list(zip(list2,list3))
list4.reverse()
list4.sort(key=lambda list4: ((list4[1]),(list4[0])), reverse=True)
return list4[0:n]
pass
but it doesn't work for the input ("aabc",2) The output it should give is
[('a', 2), ('b', 1)]
but the output I get is
[('a', 2), ('c', 1)]
Use collections.Counter()
; it has a most_common()
method that does just that:
>>> from collections import Counter
>>> counts = Counter('aaaaaabbbbcccc')
>>> counts.most_common(2)
[('a', 6), ('c', 4)]
Note that for both the above input and in aabc
both b
and c
have the same count, and both can be valid top contenders. Because both you and Counter
sort by count then key in reverse, c
is sorted before b
.
If instead of sorting in reverse, you used the negative count as the sort key, you'd sort b
before c
again:
list4.sort(key=lambda v: (-v[1], v[0))
Not that Counter.most_common()
actually uses sorting when your are asking for fewer items than there are keys in the counter; it uses a heapq
-based algorithm instead to only get the top N items.