How to find string data-type that includes a number in Pandas DataFrame

I have a DataFrame with two columns. One column contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10'),
        ('B', '10'),
        ('C', '<10'),
        ('D', '10'),
        ('E', '10-20'),
        ('F', '20.0'),
        ('G', '25.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Entries in Column value have string data-type. But, their values might be numeric or not.

What I want to get:

Find which rows have numeric values in column value.
Remove other rows from dataset.

Final result will look like:

name    value    
'B'      10         
'D'      10 
'F'      20.0  
'G'      25.1

I tried to use isnumeric() function but it returns True only for integers (not float).

If you have any idea to solve this problem, please let me know.

Updated Question (multi columns):

(The same question when there are more than one column with numeric values)

Similarly, I have a DataFrame with three columns. Two columns contain string values that may or may not include numbers (integer or float).

Sample:

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

Entries in Columns value1 & value2 have string data-type. But, their values might be numeric or not.

What I want to get:

Find which rows have numeric values in columns value1 & value2.
Remove other rows from dataset.

Final result will look like:

name    value1    value2
'B'      10         15
'D'      10         15 
'G'      25.1       30.1

You can use pandas.to_numeric with errors='coerce', then dropna to remove the invalid rows:

(data_df.assign(value=pd.to_numeric(data_df['value'], errors='coerce'))
        .dropna(subset=['value'])
)

NB. this upcasts the integers into floats, but this is the way Series works and it's better to have upcasting than forcing an object type

output:

  name  value
1    B   10.0
3    D   10.0
5    F   20.0
6    G   25.1

If you just want to slice the rows and keep the string type:

data_df[pd.to_numeric(data_df['value'], errors='coerce').notna()]

output:

  name value
1    B    10
3    D    10
5    F  20.0
6    G  25.1

updated question (multi columns)

build a mask and use any/all prior to slicing:

mask = data_df[data_df.columns[1:]].apply(pd.to_numeric, errors='coerce').notna().all(1)
data_df[mask]

Azure Service Bus interoperability between Apache Camel based producer and .NET consumer

want to print PPID and pid in bash script [closed]

What is the best way to split a string, when the prefix delimiters and the suffix delimiters are different?

Create-React-App build - "Uncaught SyntaxError: Unexpected token <"

How can I prevent VS Code from jumping to the bottom of a file after formatting?

How to Check given string is in HTML format or not in swift

Can I change the default git branch in Android Studio?

Kubernetes CPU multithreading

Flask button to save table from query as csv

Extract text between 2 similar or different strings separately in shell script

InvalidConfigException: Can't load class for name 'HFTransformersNLP'. in rasa

How to extract the source of a webpage without tags using bash?

How to find string data-type that includes a number in Pandas DataFrame

updated question (multi columns)

Related

Recent Posts