Counting non zero values in each column of a dataframe in python
I have a python-pandas-DataFrame in which first column is "user_id"
and rest of the columns are tags("Tag_0"
to "Tag_122"
).
I have the data in the following format:
UserId Tag_0 Tag_1
7867688 0 5
7867688 0 3
7867688 3 0
7867688 3.5 3.5
7867688 4 4
7867688 3.5 0
My aim is to achieve Sum(Tag)/Count(NonZero(Tags))
for each user_id
df.groupby('user_id').sum()
, gives me sum(tag)
, however I am clueless about counting non zero values
Is it possible to achieve Sum(Tag)/Count(NonZero(Tags))
in one command?
In MySQL I could achieve this as follows:-
select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1
Any help shall be appreciated.
My favorite way of getting number of nonzeros in each column is
df.astype(bool).sum(axis=0)
For the number of non-zeros in each row use
df.astype(bool).sum(axis=1)
(Thanks to Skulas)
If you have nans in your df you should make these zero first, otherwise they will be counted as 1.
df.fillna(0).astype(bool).sum(axis=1)
(Thanks to SirC)
Why not use np.count_nonzero
?
- To count the number of non-zeros of an entire dataframe,
np.count_nonzero(df)
- To count the number of non-zeros of all rows
np.count_nonzero(df, axis=0)
- To count the number of non-zeros of all columns
np.count_nonzero(df, axis=1)
It works with dates too.
To count nonzero values, just do (column!=0).sum()
, where column
is the data you want to do it for. column != 0
returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.
So to get your desired result, do
df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())