How does numpy determine the array data type when it contains multiple dtypes?
I am trying to do hands on the numpy, i cam across following datatype when used inbuilt method dtype.Following the few results i have got. Can you please explain what it means by u11
a1 = np.array([3,5,'p'])
print(a1.dtype)
o/p = >U11
Numpy's array objects that are PyArrayObject
types, have a NPY_PRIORITY
attribute that denotes the priority of the types of items for cases where the array contains items with heterogeneous data types. You can access this priority using PyArray_GetPriority
API that returns the __array_priority__
attribute which according to the the documents:
class.__array_priority__
: The value of this attribute is used to determine what type of object to return in situations where there is more than one possibility for the Python type of the returned object. Subclasses inherit a default value of 0.0 for this attribute.
Now, in this case Unicode has a higher priority than integer type and that's why a1.dtype
returns U11
.
Regarding the U11
or in general U#
, you need to note that it consists of two parts; the U
which denotes a Unicode dtype
and the #
shows the number of elements that it can hold --but it can be different in different platforms.
In [45]: a1.dtype
Out[45]: dtype('<U21') # 64bit Linux
In [46]: a1.dtype.type # The type object used to instantiate a scalar of this data-type.
Out[46]: numpy.str_
In [49]: a1.dtype.itemsize
Out[49]: 84 # 21 * 4
Read more info in greater details about string types and other datatype objects in documentation https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.dtypes.html#data-type-objects-dtype.