Unable to calculate the aggregated mean

Solution 1:

You can groupby "Customer_key" and then compute the mean of "purchase_amount" and transform it for the DataFrame (we need to transform it to use it in np.where where the values to choose from must be broadcastable). Note that mean method skips NaN values by default, so it transforms the mean of non-NaN values.

Then using np.where, depending on if a "purchase_amount" is NaN or not fill in with group-specific mean or keep original.

means = df.groupby('Customer_key')['purchase_amount'].transform('mean')
df['purchase_amount'] = np.where(df['purchase_amount'].isna(), means, df['purchase_amount'])

or you can use fillna:

df['purchase_amount'] = df['purchase_amount'].fillna(means)

For example, if you had df as below:

   Customer_key  purchase_amount        Date
0         12633           4435.0  08/07/2021
1         34243           7344.0  11/11/2021
2         54355           4642.0  10/11/2020
3         12633           6322.0  11/12/2021
3         12633              NaN  11/12/2021

the both of the above options produce:

   Customer_key  purchase_amount        Date
0         12633           4435.0  08/07/2021
1         34243           7344.0  11/11/2021
2         54355           4642.0  10/11/2020
3         12633           6322.0  11/12/2021
3         12633           5378.5  11/12/2021

Unable to calculate the aggregated mean

Solution 1:

Related

Recent Posts