Find empty or NaN entry in Pandas Dataframe

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])

Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.

missing_cols, missing_rows = (
    (df2.isnull().sum(x) | df2.eq('').sum(x))
    .loc[lambda x: x.gt(0)].index
    for x in (0, 1)
)

>>> df2.loc[missing_rows, missing_cols]
         A2       A3
2            1.10035
5 -0.508501         
6       NaN      NaN
7       NaN      NaN

I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

Find empty or NaN entry in Pandas Dataframe

Related

Recent Posts