Forced conversion of non-numeric numpy arrays with NAN replacement
Solution 1:
You can convert an array of strings into an array of floats (with NaNs) using np.genfromtxt
:
In [83]: np.set_printoptions(precision=3, suppress=True)
In [84]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf']))
Out[84]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf])
Here is a way to identify "numeric" strings:
In [34]: x
Out[34]:
array(['1', '2', 'a'],
dtype='|S1')
In [35]: x.astype('unicode')
Out[35]:
array([u'1', u'2', u'a'],
dtype='<U1')
In [36]: np.char.isnumeric(x.astype('unicode'))
Out[36]: array([ True, True, False], dtype=bool)
Note that "numeric" means a unicode that contains only digit characters -- that is, characters that have the Unicode numeric value property. It does not include the decimal point. So u'1.3'
is not considered "numeric".
Solution 2:
If you happen to be using pandas as well you could use the pd.to_numeric()
method:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: x = np.array(['1', '2', 'a'])
In [4]: pd.to_numeric(x, errors='coerce')
Out[4]: array([ 1., 2., nan])