what is the most efficient way of counting occurrences in pandas?
Solution 1:
I think df['word'].value_counts()
should serve. By skipping the groupby machinery, you'll save some time. I'm not sure why count
should be much slower than max
. Both take some time to avoid missing values. (Compare with size
.)
In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you'll do much better than that.
Solution 2:
When you want to count the frequency of categorical data in a column in pandas dataFrame use: df['Column_Name'].value_counts()
-Source.
Solution 3:
Just an addition to the previous answers. Let's not forget that when dealing with real data there might be null values, so it's useful to also include those in the counting by using the option dropna=False
(default is True
)
An example:
>>> df['Embarked'].value_counts(dropna=False)
S 644
C 168
Q 77
NaN 2