Pandas Dataframe: Why is astype method producing int32 results with an argument of int

I am using Python 3.8 and Pandas 1.3. Here is some sample code:

    data_dc = {'Dates': ['10212021','11152021','01142022','02122022']}
    df1 = pd.DataFrame(data_dc)
    print(df1['Dates'].astype(int))

Results:

0    10212021
1    11152021
2     1142022
3     2122022
Name: Dates, dtype: int32

I specified a Python data type (int) as the argument of the astype method and expected a dtype of the Dates column to be int64. Instead, I got int32. Is this a bug or am I doing something wrong? This is easy to work around, but I like to make sure I understand what to expect from the software.


Solution 1:

Pandas uses numpy datatypes under the hood. From the numpy documentation,

The default NumPy behavior is to create arrays in either 32 or 64-bit signed integers (platform dependent and matches C int size) or double precision floating point numbers, int32/int64 and float, respectively. If you expect your integer arrays to be a specific type, then you need to specify the dtype while you create the array.

It is not a bug and you should be specifying dtypes if you have a specific use or want to be platform agnostic. To rephrase your question, what is np.dtype(int) on my platform?

On windows, as some of the comments suggest, it appears to be a C signed long (32 bits). You can even get numpy to throw an overflow error to confirm this.

>>> import numpy as np
>>> np.array([2_147_483_648], dtype=int) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long