Locate first and last non NaN values in a Pandas DataFrame
@behzad.nouri's solution worked perfectly to return the first and last non-NaN values
using Series.first_valid_index and Series.last_valid_index, respectively.
Here's some helpful examples.
Series
s = pd.Series([np.NaN, 1, np.NaN, 3, np.NaN], index=list('abcde'))
s
a NaN
b 1.0
c NaN
d 3.0
e NaN
dtype: float64
# first valid index
s.first_valid_index()
# 'b'
# first valid position
s.index.get_loc(s.first_valid_index())
# 1
# last valid index
s.last_valid_index()
# 'd'
# last valid position
s.index.get_loc(s.last_valid_index())
# 3
Alternative solution using notna
and idxmax
:
# first valid index
s.notna().idxmax()
# 'b'
# last valid index
s.notna()[::-1].idxmax()
# 'd'
DataFrame
df = pd.DataFrame({
'A': [np.NaN, 1, np.NaN, 3, np.NaN],
'B': [1, np.NaN, np.NaN, np.NaN, np.NaN]
})
df
A B
0 NaN 1.0
1 1.0 NaN
2 NaN NaN
3 3.0 NaN
4 NaN NaN
(first|last)_valid_index
isn't defined on DataFrames, but you can apply them on each column using apply
.
# first valid index for each column
df.apply(pd.Series.first_valid_index)
A 1
B 0
dtype: int64
# last valid index for each column
df.apply(pd.Series.last_valid_index)
A 3
B 0
dtype: int64
As before, you can also use notna
and idxmax
. This is slightly more natural syntax.
# first valid index
df.notna().idxmax()
A 1
B 0
dtype: int64
# last valid index
df.notna()[::-1].idxmax()
A 3
B 0
dtype: int64