How to remove all rows in a numpy.ndarray that contain non-numeric values
Basically, I'm doing some data analysis. I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN
, or by being a string written "NA
").
I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?
>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[ 1., 2., 3.],
[ 4., 5., nan],
[ 7., 8., 9.]])
>>> a[~np.isnan(a).any(axis=1)]
array([[ 1., 2., 3.],
[ 7., 8., 9.]])
and reassign this to a
.
Explanation: np.isnan(a)
returns a similar array with True
where NaN
, False
elsewhere. .any(axis=1)
reduces an m*n
array to n
with an logical or
operation on the whole rows, ~
inverts True/False
and a[ ]
chooses just the rows from the original array, which have True
within the brackets.