How to impute nan values in a Pandas dataframe from a multi-index dataframe?

I have the following dataframe:

df = pd.DataFrame([[np.nan, 2, 20, 4],
                   [3, 1, np.nan, 1],
                   [3, 1, 15, 1],
                   [np.nan, 1, np.nan, 1],
                   [10, 1, 30, 4],
                   [50, 2, 35, 4],
                   [10, 1, 37, 4],
                   [40, 2, 30, 1]],
                  columns=list("ABCD"))

I want to fill the Nan values with their group means. Towards that purpose, I run the following:

df_mean=df.groupby(["B","D"]).mean()
df_mean

        A     C
B   D       
1   1   3.0   15.0
    4   10.0  33.5
2   1   40.0  30.0
    4   50.0  27.5

Is there a way to fill the dataframe df with the values computed in df_mean?

One way to do this would be as in this answer

df[["A", "C"]] = (
    df
    # create groups
    .groupby(["B", "D"])
    # transform the groups by filling na values with the group mean
    .transform(lambda x: x.fillna(x.mean()))
)

However, for a few millions of rows, where the simple groupby([...]).mean() would take a few seconds, take too long...

It there a quicker way to solve this?


Use GroupBy.transform by mean and pass to DataFrame.fillna:

df = df.fillna(df.groupby(["B", "D"]).transform('mean'))
print (df)
      A  B     C  D
0  50.0  2  20.0  4
1   3.0  1  15.0  1
2   3.0  1  15.0  1
3   3.0  1  15.0  1
4  10.0  1  30.0  4
5  50.0  2  35.0  4
6  10.0  1  37.0  4
7  40.0  2  30.0  1 

Your solution with aggregation is possible also use this way:

df = df.fillna(df[['B','D']].join(df.groupby(["B","D"]).mean(), on=['B','D']))
print (df)
      A  B     C  D
0  50.0  2  20.0  4
1   3.0  1  15.0  1
2   3.0  1  15.0  1
3   3.0  1  15.0  1
4  10.0  1  30.0  4
5  50.0  2  35.0  4
6  10.0  1  37.0  4
7  40.0  2  30.0  1

You can use combine_first:

out = df.combine_first(df.groupby(['B', 'D']).transform('mean'))
print(out)

# Output
      A  B     C  D
0  50.0  2  20.0  4
1   3.0  1  15.0  1
2   3.0  1  15.0  1
3   3.0  1  15.0  1
4  10.0  1  30.0  4
5  50.0  2  35.0  4
6  10.0  1  37.0  4
7  40.0  2  30.0  1