How to implode(reverse of pandas explode) based on a column

I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4

I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

But How do i be able to join APPLICABLE_DAYS while performing this aggregation?

If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:

Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5

Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

 df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399       SUN,MON,TUE,WED         25           2.5

If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.

Your results look more like a sum, than average; The solution below uses named aggregation :

    df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Total_Cases=("Case", "sum"),
    Total_Delivery=("Delivery", "sum"),
)

                        APPLICABLE_DAYS       Total_Cases   Total_Delivery
NETWORK config_id           
Grocery 5399                SUN,MON,TUE,WED           100      10

If it is the mean, then you can change the 'sum' to 'mean' :

df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Avg_Cases=("Case", "mean"),
    Avg_Delivery=("Delivery", "mean"),
)

                    APPLICABLE_DAYS   Avg_Cases Avg_Delivery
NETWORK config_id           
Grocery 5399         SUN,MON,TUE,WED      25      2.5

How to implode(reverse of pandas explode) based on a column

Related

Recent Posts