error using astype when NaN exists in a dataframe
If some values in column are missing (NaN
) and then converted to numeric, always dtype
is float
. You cannot convert values to int
. Only to float
, because type
of NaN
is float
.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN
:
integer > cast to float64
If need int values you need replace NaN
to some int
, e.g. 0
by fillna
and then it works perfectly:
df['A'] = df['A'].str.extract('(\d+)', expand=False)
df['B'] = df['B'].str.extract('(\d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
From pandas >= 0.24 there is now a built-in pandas integer.
This does allow integer nan's, so you don't need to fill na's.
Notice the capital in 'Int64'
in the code below.
This is the pandas integer, instead of the numpy integer.
You need to use: .astype('Int64')
So, do this:
df['A'] = df['A'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
df['B'] = df['B'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions