Converting strings to floats in a DataFrame
How to covert a DataFrame column containing strings and NaN
values to floats. And there is another column whose values are strings and floats; how to convert this entire column to floats.
Solution 1:
NOTE:
pd.convert_objects
has now been deprecated. You should usepd.Series.astype(float)
orpd.to_numeric
as described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype
will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
Solution 2:
You can try df.column_name = df.column_name.astype(float)
. As for the NaN
values, you need to specify how they should be converted, but you can use the .fillna
method to do it.
Example:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])