Check if a column value is numeric in pandas dataframe

I have a dataset that I want to clean. The data set consists of 54 columns and 315 rows. For one of the columns, I want to find whether all the values in that column are numeric or not. I have done the following:

work_sheet = pd.read_excel('2006_sale.xlsx', sheet_name='Sheet1')
df = work_sheet.copy()

TRY 1

for idx,val in enumerate(df['LotArea']):
    if(not(str(val).isnumeric())):        # Check if a value is numeric or not
        df.at[idx,'LotArea'] = np.nan     # If the value is not numeric then replace it with null

TRY 2

for idx,val in enumerate(df['LotArea']):
    if(not(isinstance(val,float))):        # Check if a value is numeric or not
        df.at[idx,'LotArea'] = np.nan     # If the value is not numeric then replace it with null

Sample values of LotArea is: enter image description here

Problem with both the approach Somehow it is detecting each value as non-numeric and my final output looks like this: enter image description here

Any idea where i am going wrong?

A for loop is not needed to achieve this. You can use the pd.to_numeric method and by setting errors to 'coerce', all non-numeric values will be replaced with NaN.

df['LotArea'] = pd.to_numeric(df['LotArea'], errors='coerce')

first I would like to drop this link here. for-loop in pandas is anti-pattern and there are many performant way to achieve data transformation without using the for-loop. Please check the link.

https://stackoverflow.com/a/55557758/2956135

To answer your question, use replace function with a regex.

df['LotArea'] = df.LotArea.replace(regex='|[^\d+]', value=np.nan)

Check if a column value is numeric in pandas dataframe

Related

Recent Posts