Convert Pandas column containing NaNs to dtype `int`

Solution 1:

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.

Solution 2:

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.

Nullable Integer Data Type.

Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array() or Series:

arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)

0      1
1      2
2    NaN
dtype: Int64

For convert column to nullable integers use:

df['myCol'] = df['myCol'].astype('Int64')

Solution 3:

My use case is munging data prior to loading into a DB table:

df[col] = df[col].fillna(-1)
df[col] = df[col].astype(int)
df[col] = df[col].astype(str)
df[col] = df[col].replace('-1', np.nan)

Remove NaNs, convert to int, convert to str and then reinsert NANs.

It's not pretty but it gets the job done!

Solution 4:

It is now possible to create a pandas column containing NaNs as dtype int, since it is now officially added on pandas 0.24.0

pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values

Solution 5:

If you absolutely want to combine integers and NaNs in a column, you can use the 'object' data type:

df['col'] = (
    df['col'].fillna(0)
    .astype(int)
    .astype(object)
    .where(df['col'].notnull())
)

This will replace NaNs with an integer (doesn't matter which), convert to int, convert to object and finally reinsert NaNs.