How to fill NaN values based on the top and bottom strings with highest frequency
I have a dataframe
of string values with missing values in it. It needs to be populated/filled by the below conditions.
- From the
NaN
value index , Check the last 3 rows and next 3 rows and replace theNaN
with the most frequent/repeated value out of 6 rows. - If there is 2 strings with an equal amount of frequency occurred from the last 3 rows and next 3 rows , replace the
NaN
with the value that has lowest index out of theses 6 rows.
My DataFrame
:
reading
0 talk
1 kill
2 NaN
3 vertical
4 type
5 kill
6 NaN
7 vertical
8 vertical
9 type
10 durable
11 NaN
12 durable
13 vertical
Expected output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical
Here is the minimum reproducible code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'reading':['talk','kill',np.NAN,'vertical','type','kill',np.NAN,'vertical','vertical','type','durable',np.NAN,'durable','vertical']})
def filldf(df):
# Do the logic here
return df
I am not sure how to approach this problem. Any help will be appreciated !!
If you don't have too many NaN values, you can iterate over the index of NaN "reading" values and simply look for the mode
of the surrounding 6 values of it (use iloc
to get the first occurrence of multiple modes) and assign the values back to the corresponding "NaN" values
msk = df['reading'].isna()
df.loc[msk, 'reading'] = [df.loc[min(0, i-3):i+3, 'reading'].mode().iloc[0] for i in df.index[msk]]
Output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical