Delete the rows which have NaN in every cells in python
I have a data called data
.
The problem with this data
is that a lot of rows have NaN
I want to delete every rows which have NaN in every cells except time
.
For example, I wnat to delete the row with time
2021-12-24 01:00:20
and
the row with time
2021-12-24 01:00:30
in this case.
The import thing is that I don't want to delete the rows with only a few NaN.
I wnat to delete the rows which have every cells NaN except time
.
How can I do?
You can do it explicitly if you need to tweak it, or just using the function that pandas
offers for this:
from pandas import DataFrame, to_datetime
data = [
['2022-01-01 00:00', 1, 2, 3],
['2022-01-02 00:00', None, 2, 3],
['2022-01-03 00:00', None, None, None],
['2022-01-04 00:00', 1, 2, None],
['2022-01-05 00:00', None, None, None],
]
df = DataFrame(data, columns=['time', 'x', 'y', 'z'])
df['time'] = to_datetime(df['time'])
# explicitly select all the columns after the first, check if values are nan,
# then get the indices for all rows where every cell is true - drop those rows
one = df.drop(df[df.iloc[:, 1:].isnull().all(axis=1)].index)
print(one)
# but pandas allows you to do it in one go as long as `time` is never nan
two = df.dropna(thresh=len(df.columns)-1)
print(two)