How to implode(reverse of pandas explode) based on a column
I have a dataframe df like below
NETWORK config_id APPLICABLE_DAYS Case Delivery
0 Grocery 5399 SUN 10 1
1 Grocery 5399 MON 20 2
2 Grocery 5399 TUE 30 3
3 Grocery 5399 WED 40 4
I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 90 10
using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.
df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})
But How do i be able to join APPLICABLE_DAYS while performing this aggregation?
If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:
Use lambda x: x.tolist()
for the 'APPLICABLE_DAYS'
column within your .agg
groupby function:
df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 [SUN, MON, TUE, WED] 25 2.5
Use lambda x: ",".join(x)
for the 'APPLICABLE_DAYS'
column within your .agg
groupby function:
df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 25 2.5
If you are looking for the sum
, then you can just change mean
to sum
for the Cases
and Delivery
columns.
Your results look more like a sum, than average; The solution below uses named aggregation :
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Total_Cases=("Case", "sum"),
Total_Delivery=("Delivery", "sum"),
)
APPLICABLE_DAYS Total_Cases Total_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 100 10
If it is the mean, then you can change the 'sum' to 'mean' :
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Avg_Cases=("Case", "mean"),
Avg_Delivery=("Delivery", "mean"),
)
APPLICABLE_DAYS Avg_Cases Avg_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 25 2.5