Filling NA based on match from other column [duplicate]

Let's suppose that I have the following dataset:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    NA
1990    Northern Ireland    Female  NA
1990    Northern Ireland    Total   NA
1991    Northern Ireland    Male    NA
1991    Northern Ireland    Female  NA
1991    Northern Ireland    Total   NA
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

and finally I want to have the following:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    792100
1990    Northern Ireland    Female  831100
1990    Northern Ireland    Total   1623300
1991    Northern Ireland    Male    792100
1991    Northern Ireland    Female  831100
1991    Northern Ireland    Total   1623300
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

meaning that basically I want to fill in the values of the previous years with the values of the first year without NAs.

How do I do this?


You can chain pandas.DataFrame.sort_values, pandas.DataFrame.fillna with method bfill and after that pandas.DataFrame.sort_index to get your original index back in order:

df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()

print(df)
   Time         Geography     Sex  Population
0  1990  Northern Ireland    Male    792100.0
1  1990  Northern Ireland  Female    831100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland    Male    792100.0
4  1991  Northern Ireland  Female    831100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland    Male    792100.0
7  1992  Northern Ireland  Female    831100.0
8  1992  Northern Ireland   Total   1623300.0

You can try this:

df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()

Output:

   Time         Geography     Sex  Population
0  1990  Northern Ireland  Female    831100.0
1  1990  Northern Ireland    Male    792100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland  Female    831100.0
4  1991  Northern Ireland    Male    792100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland  Female    831100.0
7  1992  Northern Ireland    Male    792100.0
8  1992  Northern Ireland   Total   1623300.0

I will using groupby and bfill and ffill(I am adding ffill and bfill just for protection)

df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
   Time        Geography     Sex  Population
0  1990  NorthernIreland    Male    792100.0
1  1990  NorthernIreland  Female    831100.0
2  1990  NorthernIreland   Total   1623300.0
3  1991  NorthernIreland    Male    792100.0
4  1991  NorthernIreland  Female    831100.0
5  1991  NorthernIreland   Total   1623300.0
6  1992  NorthernIreland    Male    792100.0
7  1992  NorthernIreland  Female    831100.0
8  1992  NorthernIreland   Total   1623300.0