Convert Pandas column containing NaNs to dtype `int`
Solution 1:
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
Solution 2:
In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.
Nullable Integer Data Type.
Pandas can represent integer data with possibly missing values using arrays.IntegerArray
. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array()
or Series
:
arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)
0 1
1 2
2 NaN
dtype: Int64
For convert column to nullable integers use:
df['myCol'] = df['myCol'].astype('Int64')
Solution 3:
My use case is munging data prior to loading into a DB table:
df[col] = df[col].fillna(-1)
df[col] = df[col].astype(int)
df[col] = df[col].astype(str)
df[col] = df[col].replace('-1', np.nan)
Remove NaNs, convert to int, convert to str and then reinsert NANs.
It's not pretty but it gets the job done!
Solution 4:
It is now possible to create a pandas column containing NaNs as dtype int
, since it is now officially added on pandas 0.24.0
pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values
Solution 5:
If you absolutely want to combine integers and NaNs in a column, you can use the 'object' data type:
df['col'] = (
df['col'].fillna(0)
.astype(int)
.astype(object)
.where(df['col'].notnull())
)
This will replace NaNs with an integer (doesn't matter which), convert to int, convert to object and finally reinsert NaNs.