Count elements in defined groups in pandas dataframe

Solution 1:

You can use value_counts and reindex:

df = pd.DataFrame({'col1': [1,1,5,1,5,1,1,4,3]})

elem_list = [1,5,2]
df['col1'].value_counts().reindex(elem_list, fill_value=0)

output:

1    5
5    2
2    0

benchmark (100k values):

# setup
df = pd.DataFrame({'col1': np.random.randint(0,10, size=100000)})

df['col1'].value_counts().reindex(elem_list, fill_value=0)
# 774 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

pd.Categorical(df['col1'],elem_list).value_counts()
# 2.72 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df.loc[df["col1"].isin(elem_list), 'col1'].value_counts().reindex(elem_list, fill_value=0)
# 2.98 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Solution 2:

Pass to the Categorical which will return 0 for missing item

pd.Categorical(df['col1'],elem_list).value_counts()
Out[62]: 
1    3
5    0
2    1
dtype: int64

Solution 3:

First filter by Series.isin and DataFrame.loc and then use Series.value_counts, last if order is important add Series.reindex:

df.loc[df["col1"].isin(elem_list), 'col1'].value_counts().reindex(elem_list, fill_values=0)

Count elements in defined groups in pandas dataframe

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts