how can one overwrite the subsequent values in a dataframe grouping with NaNs once a NaN has been observed?

My question is how can one overwrite the subsequent values in a dataframe grouping once a NaN has been observed?

In the below dataframe, once a NaN has been observed for a particular grouping (i.e. same Pos and Stop values), I'd like to copy down the NaNs in a vectorized way to the subsequent rows in the grouping.

import numpy as np
import pandas as pd

df = pd.DataFrame({
'Pos': [np.nan, np.nan,  1.,  1., -1., -1., np.nan, -1., -1., -1., -1.,  1., np.nan, 1.,  np.nan,  1.,  1.], 
'Stop': [np.nan, np.nan, 122.86, 122.86, 128.  , 128.  , np.nan, 128.  , 128, 125.8 , 125.8 , 124.05, np.nan, 123.85, np.nan, 123.85, 123.85]}, 
index = pd.date_range('2022-01-01',periods=17))

df

            Pos    Stop
2022-01-01  NaN     NaN
2022-01-02  NaN     NaN
2022-01-03  1.0  122.86
2022-01-04  1.0  122.86
2022-01-05 -1.0  128.00
2022-01-06 -1.0  128.00
2022-01-07  NaN     NaN
2022-01-08 -1.0  128.00
2022-01-09 -1.0  128.00
2022-01-10 -1.0  125.80
2022-01-11 -1.0  125.80
2022-01-12  1.0  124.05
2022-01-13  NaN     NaN
2022-01-14  1.0  123.85
2022-01-15  NaN     NaN
2022-01-16  1.0  123.85
2022-01-17  1.0  123.85

So for example, since 2022-01-07 contains NaNs and since 2022-01-08 & 2022-01-09 gave the same values for Pos (e.g. -1) and Stop (e.g. 128) as the two rows before the NaN, I'd like to replace the values in the 2022-01-08 & 2022-01-09 rows with NaNs. Similarly, since 2022-01-15 contains NaNs, I'd like to replace the values in 2022-01-16 and 2022-01-17 with NaNs. Note that the rows above the NaNs for a particular grouping should remain unchanged.

I tried using a groupby but was unsucessful. Therefore, my question is how can one overwrite the subsequent values in a dataframe grouping with NaNs once a NaN has been observed?

Expected output is below:

df1
            Pos    Stop
2022-01-01  NaN     NaN
2022-01-02  NaN     NaN
2022-01-03  1.0  122.86
2022-01-04  1.0  122.86
2022-01-05 -1.0  128.00
2022-01-06 -1.0  128.00
2022-01-07  NaN     NaN
2022-01-08  NaN     NaN
2022-01-09  NaN     NaN
2022-01-10 -1.0  125.80
2022-01-11 -1.0  125.80
2022-01-12  1.0  124.05
2022-01-13  NaN     NaN
2022-01-14  1.0  123.85
2022-01-15  NaN     NaN
2022-01-16  NaN     NaN
2022-01-17  NaN     NaN

Solution 1:

One way using pandas.DataFrame.interpolate and where:

df2 = df.interpolate()
m = df["Pos"].notna()
m = df2.assign(tmp=m).groupby(["Pos", "Stop"])["tmp"].cummin().eq(1)
new_df = df.where(m)

print(new_df)

Output:

            Pos    Stop
2022-01-01  NaN     NaN
2022-01-02  NaN     NaN
2022-01-03  1.0  122.86
2022-01-04  1.0  122.86
2022-01-05 -1.0  128.00
2022-01-06 -1.0  128.00
2022-01-07  NaN     NaN
2022-01-08  NaN     NaN
2022-01-09  NaN     NaN
2022-01-10 -1.0  125.80
2022-01-11 -1.0  125.80
2022-01-12  1.0  124.05
2022-01-13  NaN     NaN
2022-01-14  1.0  123.85
2022-01-15  NaN     NaN
2022-01-16  NaN     NaN
2022-01-17  NaN     NaN