Filtering all rows with NaT in a column in Dataframe python
Solution 1:
isnull
and notnull
work with NaT
so you can handle them much the same way you handle NaNs
:
>>> df
a b c
0 1 NaT w
1 2 2014-02-01 g
2 3 NaT x
>>> df.dtypes
a int64
b datetime64[ns]
c object
just use isnull
to select:
df[df.b.isnull()]
a b c
0 1 NaT w
2 3 NaT x
Solution 2:
For those interested, in my case I wanted to drop the NaT contained in the DateTimeIndex of a dataframe. I could not directly use the notnull construction as suggested by Karl D. You first have to create a temporary column out of the index, then apply the mask, and then delete the temporary column again.
df["TMP"] = df.index.values # index is a DateTimeIndex
df = df[df.TMP.notnull()] # remove all NaT values
df.drop(["TMP"], axis=1, inplace=True) # delete TMP again
Solution 3:
Using your example dataframe:
df = pd.DataFrame({"a":[1,2,3],
"b":[pd.NaT, pd.to_datetime("2014-02-01"), pd.NaT],
"c":["w", "g", "x"]})
Until v0.17 this didn't use to work:
df.query('b != b')
and you had to do:
df.query('b == "NaT"') # yes, surprisingly, this works!
Since v0.17 though, both methods work, although I would only recommend the first one.
Solution 4:
I feel that the comment by @DSM is worth a answer on its own, because this answers the fundamental question.
The misunderstanding comes from the assumption that pd.NaT
acts like None
. However, while None == None
returns True
, pd.NaT == pd.NaT
returns False
. Pandas NaT
behaves like a floating-point NaN
, which is not equal to itself.
As the previous answer explain, you should use
df[df.b.isnull()] # or notnull(), respectively