Pandas: filling missing values by mean in each group
One way would be to use transform
:
>>> df
name value
0 A 1
1 A NaN
2 B NaN
3 B 2
4 B 3
5 B 1
6 C 3
7 C NaN
8 C 3
>>> df["value"] = df.groupby("name").transform(lambda x: x.fillna(x.mean()))
>>> df
name value
0 A 1
1 A 1
2 B 2
3 B 2
4 B 3
5 B 1
6 C 3
7 C 3
8 C 3
fillna
+ groupby
+ transform
+ mean
This seems intuitive:
df['value'] = df['value'].fillna(df.groupby('name')['value'].transform('mean'))
The groupby
+ transform
syntax maps the groupwise mean to the index of the original dataframe. This is roughly equivalent to @DSM's solution, but avoids the need to define an anonymous lambda
function.