Check if a column value is numeric in pandas dataframe
I have a dataset that I want to clean. The data set consists of 54 columns and 315 rows. For one of the columns, I want to find whether all the values in that column are numeric or not. I have done the following:
work_sheet = pd.read_excel('2006_sale.xlsx', sheet_name='Sheet1')
df = work_sheet.copy()
TRY 1
for idx,val in enumerate(df['LotArea']):
if(not(str(val).isnumeric())): # Check if a value is numeric or not
df.at[idx,'LotArea'] = np.nan # If the value is not numeric then replace it with null
TRY 2
for idx,val in enumerate(df['LotArea']):
if(not(isinstance(val,float))): # Check if a value is numeric or not
df.at[idx,'LotArea'] = np.nan # If the value is not numeric then replace it with null
Sample values of LotArea is:
Problem with both the approach Somehow it is detecting each value as non-numeric and my final output looks like this:
Any idea where i am going wrong?
A for loop is not needed to achieve this. You can use the pd.to_numeric method and by setting errors to 'coerce', all non-numeric values will be replaced with NaN.
df['LotArea'] = pd.to_numeric(df['LotArea'], errors='coerce')
first I would like to drop this link here. for-loop in pandas is anti-pattern and there are many performant way to achieve data transformation without using the for-loop. Please check the link.
https://stackoverflow.com/a/55557758/2956135
To answer your question, use replace
function with a regex.
df['LotArea'] = df.LotArea.replace(regex='|[^\d+]', value=np.nan)