how can one overwrite the subsequent values in a dataframe grouping with NaNs once a NaN has been observed?
My question is how can one overwrite the subsequent values in a dataframe grouping once a NaN has been observed?
In the below dataframe, once a NaN
has been observed for a particular grouping (i.e. same Pos
and Stop
values), I'd like to copy down the NaN
s in a vectorized way to the subsequent rows in the grouping.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Pos': [np.nan, np.nan, 1., 1., -1., -1., np.nan, -1., -1., -1., -1., 1., np.nan, 1., np.nan, 1., 1.],
'Stop': [np.nan, np.nan, 122.86, 122.86, 128. , 128. , np.nan, 128. , 128, 125.8 , 125.8 , 124.05, np.nan, 123.85, np.nan, 123.85, 123.85]},
index = pd.date_range('2022-01-01',periods=17))
df
Pos Stop
2022-01-01 NaN NaN
2022-01-02 NaN NaN
2022-01-03 1.0 122.86
2022-01-04 1.0 122.86
2022-01-05 -1.0 128.00
2022-01-06 -1.0 128.00
2022-01-07 NaN NaN
2022-01-08 -1.0 128.00
2022-01-09 -1.0 128.00
2022-01-10 -1.0 125.80
2022-01-11 -1.0 125.80
2022-01-12 1.0 124.05
2022-01-13 NaN NaN
2022-01-14 1.0 123.85
2022-01-15 NaN NaN
2022-01-16 1.0 123.85
2022-01-17 1.0 123.85
So for example, since 2022-01-07
contains NaNs
and since 2022-01-08
& 2022-01-09
gave the same values for Pos
(e.g. -1) and Stop
(e.g. 128) as the two rows before the NaN
, I'd like to replace the values in the 2022-01-08
& 2022-01-09
rows with NaN
s. Similarly, since 2022-01-15
contains NaN
s, I'd like to replace the values in 2022-01-16
and 2022-01-17
with NaN
s.
Note that the rows above the NaN
s for a particular grouping should remain unchanged.
I tried using a groupby
but was unsucessful. Therefore, my question is how can one overwrite the subsequent values in a dataframe grouping with NaN
s once a NaN
has been observed?
Expected output is below:
df1
Pos Stop
2022-01-01 NaN NaN
2022-01-02 NaN NaN
2022-01-03 1.0 122.86
2022-01-04 1.0 122.86
2022-01-05 -1.0 128.00
2022-01-06 -1.0 128.00
2022-01-07 NaN NaN
2022-01-08 NaN NaN
2022-01-09 NaN NaN
2022-01-10 -1.0 125.80
2022-01-11 -1.0 125.80
2022-01-12 1.0 124.05
2022-01-13 NaN NaN
2022-01-14 1.0 123.85
2022-01-15 NaN NaN
2022-01-16 NaN NaN
2022-01-17 NaN NaN
Solution 1:
One way using pandas.DataFrame.interpolate
and where
:
df2 = df.interpolate()
m = df["Pos"].notna()
m = df2.assign(tmp=m).groupby(["Pos", "Stop"])["tmp"].cummin().eq(1)
new_df = df.where(m)
print(new_df)
Output:
Pos Stop
2022-01-01 NaN NaN
2022-01-02 NaN NaN
2022-01-03 1.0 122.86
2022-01-04 1.0 122.86
2022-01-05 -1.0 128.00
2022-01-06 -1.0 128.00
2022-01-07 NaN NaN
2022-01-08 NaN NaN
2022-01-09 NaN NaN
2022-01-10 -1.0 125.80
2022-01-11 -1.0 125.80
2022-01-12 1.0 124.05
2022-01-13 NaN NaN
2022-01-14 1.0 123.85
2022-01-15 NaN NaN
2022-01-16 NaN NaN
2022-01-17 NaN NaN