How to aggregate one column based on another column in Pandas
year fruit sales
0 2010 Apple 10
1 2011 Apple 20
2 2010 Banans 50000
3 2011 Banans 30
What I want is like this:
fruit min_year sales_2010 max_year sales_2011
0 Apple 2010 10 2011 20
1 Banans 2010 50000 2011 30
First aggregate min
and max
to df1
with DataFrame.add_suffix
, then pivoting by DataFrame.pivot
with DataFrame.add_prefix
and last join toghether by concat
:
df1 = df.groupby('fruit')['year'].agg(['min','max']).add_suffix('_year')
df2 = df.pivot('fruit','year','sales').add_prefix('sales_')
df = pd.concat([df1, df2], axis=1)
print (df)
min_year max_year sales_2010 sales_2011
fruit
Apple 2010 2011 10 20
Banans 2010 2011 50000 30
One option:
(df
.pivot("fruit", "year", "sales")
.assign(min_year=lambda df: df.columns.min(),
max_year=lambda df: df.columns[:-1].max())
.rename(columns=lambda col: f"sales_{col}"
if isinstance(col, int)
else col)
.rename_axis(columns=None)
.reset_index()
)
fruit sales_2010 sales_2011 min_year max_year
0 Apple 10 20 2010 2011
1 Banans 50000 30 2010 2011
Another option, that might be more efficient:
grouper = df.groupby('fruit')
(df
.assign(min_year=grouper.year.transform("min"),
max_year=grouper.year.transform("max"))
.pivot(["fruit", "min_year", "max_year"], "year", "sales")
.add_prefix("sales_")
.rename_axis(columns=None)
.reset_index()
)
fruit min_year max_year sales_2010 sales_2011
0 Apple 2010 2011 10 20
1 Banans 2010 2011 50000 30