Unable to calculate the aggregated mean
Solution 1:
You can groupby
"Customer_key" and then compute the mean of "purchase_amount" and transform it for the DataFrame (we need to transform it to use it in np.where
where the values to choose from must be broadcastable). Note that mean
method skips NaN values by default, so it transforms the mean of non-NaN values.
Then using np.where
, depending on if a "purchase_amount" is NaN or not fill in with group-specific mean or keep original.
means = df.groupby('Customer_key')['purchase_amount'].transform('mean')
df['purchase_amount'] = np.where(df['purchase_amount'].isna(), means, df['purchase_amount'])
or you can use fillna
:
df['purchase_amount'] = df['purchase_amount'].fillna(means)
For example, if you had df
as below:
Customer_key purchase_amount Date
0 12633 4435.0 08/07/2021
1 34243 7344.0 11/11/2021
2 54355 4642.0 10/11/2020
3 12633 6322.0 11/12/2021
3 12633 NaN 11/12/2021
the both of the above options produce:
Customer_key purchase_amount Date
0 12633 4435.0 08/07/2021
1 34243 7344.0 11/11/2021
2 54355 4642.0 10/11/2020
3 12633 6322.0 11/12/2021
3 12633 5378.5 11/12/2021