Fill na by applying condition on another column
There is a df which contains two columns. First column has monthly values but the second one only contains quarterly values. I want to fill the NA values of second column by the same percentage change on the first column. For example, the original df looks like this:
ColA ColB
2019-12-31 100 5
2020-01-31 200 NA
2020-02-28 300 NA
2020-03-31 200 20
2020-04-30 300 NA
And I want to calculate the percentage change of colA and then fill the colB as per the percentage change. The result should look like:
ColA ColB
2019-12-31 100 5
2020-01-31 200 10
2020-02-28 300 15
2020-03-31 200 20
2020-04-30 300 30
I can calculate the percentage changes of colA by pct_change()
but not getting the point how to apply only on the na values of colB. Is there any solution for it please?
You could simply compute a ratio, ffill it, and use combine_first to update missing values:
ratio = (df['ColB'] / df['ColA']).ffill()
df['ColB'] = df['ColB'].combine_first(df['ColA'] * ratio)
It is enough to get the expected result:
ColA ColB
2019-12-31 100 5.0
2020-01-31 200 10.0
2020-02-28 300 15.0
2020-03-31 200 20.0
2020-04-30 300 30.0
First idea is create percenta change
column and inverse percentage change by ColB
- first value.
df['pct'] = df['ColA'].pct_change()
df['ColB'] = df['pct'].add(1,fill_value=0).cumprod().mul(df.loc[df.index[0], 'ColB'])
print (df)
ColA ColB pct
2019-12-31 100 5.0 NaN
2020-01-31 200 10.0 1.000000
2020-02-28 300 15.0 0.500000
2020-03-31 200 10.0 -0.333333
2020-04-30 300 15.0 0.500000
If possible create groups and get percentage per groups defined non missing values in ColB
use:
df['g'] = df['ColB'].notna().cumsum()
df['pct'] = df.groupby('g')['ColA'].pct_change()
df['ColB'] = df.groupby('g')['pct'].transform(lambda x: x.add(1,fill_value=0).cumprod().mul(df.loc[x.index[0], 'ColB']))
print (df)
ColA ColB g pct
2019-12-31 100 5.0 1 NaN
2020-01-31 200 10.0 1 1.0
2020-02-28 300 15.0 1 0.5
2020-03-31 200 20.0 2 NaN
2020-04-30 300 30.0 2 0.5