How to fill a column based on another column truth value?
I have a df (car_data) where there are 2 columns: model and is_4wd.
The is_4wd is either 0 or 1 and have about 25,000 missing values. However, I know that some models are 4wd because they already has a 1, and the same models have nan.
How can I replace the nan values for the models I know they already 1?
I have created a for loop, but I had to change all nan values to 0, create a variable of unique car models and the loop take a long time to complete.
car_data['is_4wd']=car_data['is_4wd'].fillna(0)
car_4wd=car_data.query('is_4wd==1')
caru=car_4wd['model'].unique()
for index, row in car_data.iterrows():
if row['is_4wd']==0:
if row['model'] in caru:
car_data.loc[car_data.model==row['model'],'is_4wd']=1
Is there a better way to do it? Tried several replace() methods but to no avail.
The df head looks like this: (you can see ford f-150 for example has both 1 and nan in is_4wd) the expected outcome is to replace all the nan for the models I know they have values already entered with 1.
price model_year model condition cylinders fuel odometer \
0 9400 2011.0 bmw x5 good 6.0 gas 145000.0
1 25500 NaN ford f-150 good 6.0 gas 88705.0
2 5500 2013.0 hyundai sonata like new 4.0 gas 110000.0
3 1500 2003.0 ford f-150 fair 8.0 gas NaN
4 14900 2017.0 chrysler 200 excellent 4.0 gas 80903.0
transmission type paint_color is_4wd date_posted days_listed
0 automatic SUV NaN 1.0 2018-06-23 19
1 automatic pickup white 1.0 2018-10-19 50
2 automatic sedan red NaN 2019-02-07 79
3 automatic pickup NaN NaN 2019-03-22 9
4 automatic sedan black NaN 2019-04-02 28
Group your data by model
column and fill is_4wd
column by the max value of the group:
df['is_4wd'] = df.groupby('model')['is_4wd'] \
.transform(lambda x: x.fillna(x.max())).fillna(0).astype(int)
print(df[['model', 'is_4wd']])
# Output:
model is_4wd
0 bmw x5 1
1 ford f-150 1
2 hyundai sonata 0
3 ford f-150 1
4 chrysler 200 0