Python Pandas : pivot table with aggfunc = count unique distinct
Solution 1:
Do you mean something like this?
>>> df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=lambda x: len(x.unique()))
Z Z1 Z2 Z3
Y
Y1 1 1 NaN
Y2 NaN NaN 1
Note that using len
assumes you don't have NA
s in your DataFrame. You can do x.value_counts().count()
or len(x.dropna().unique())
otherwise.
Solution 2:
This is a good way of counting entries within .pivot_table
:
>>> df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')
X1 X2
Y Z
Y1 Z1 1 1
Z2 1 NaN
Y2 Z3 1 NaN
Solution 3:
Since at least version 0.16 of pandas, it does not take the parameter "rows"
As of 0.23, the solution would be:
df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)
which returns:
Z Z1 Z2 Z3
Y
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
Solution 4:
aggfunc=pd.Series.nunique
provides distinct count. Full code is following:
df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)
Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.
Solution 5:
- The
aggfunc
parameter inpandas.DataFrame.pivot_table
will take'nunique'
as astring
, or in alist
-
pandas.Series.nunique
orpandas.core.groupby.DataFrameGroupBy.nunique
-
- Tested in
pandas 1.3.1
out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])
[out]:
nunique count <lambda> len
Z Z1 Z2 Z3 Z1 Z2 Z3 Z1 Z2 Z3 Z1 Z2 Z3
Y
Y1 1.0 1.0 NaN 2.0 1.0 NaN 1.0 1.0 NaN 2.0 1.0 NaN
Y2 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0
out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')
[out]:
Z Z1 Z2 Z3
Y
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])
[out]:
nunique
Z Z1 Z2 Z3
Y
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0