Pandas: Is Series homogenous?

Is Pandas Series homogenous or heterogeneous?

import pandas as pd
S=pd.Series([1,2,3,1.5,'US',True,False,'India'])
>>> S
0        1
1        2
2        3
3      1.5
4       US
5     True
6    False
7    India
dtype: object
>>> S[3]
1.5
>>> type(S[5])
<class 'bool'>

By definition and documentation Series are homogenous.

Series are defined as:

One-dimensional ndarray with axis labels (including time series).

ndarray are defined as:

An array object represents a multidimensional, homogeneous array of fixed-size items.

(bolding mine in both quotes)

Series of dtype object, however, are tricky. Since virtually everything in python can be considered an object, a huge amount of different types of variables can be referenced by this kind of Series. So while the Series itself is a homogenous collection of references to objects, those individual objects when referenced may have heterogenous subtypes.

Accessing each individual value from the Series will result in their individual types being exposed, however, we only have a guarantee that all of the elements in the Series will be of type object.


We most commonly get Series of dtype object either when working strings, or when pulling rows from fragmented DataFrames.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['x', 'y', 'z'],
    'C': [3.4, 5.6, 7.8]
})

s1 = df['B']
s2 = df.iloc[0, :]

s1 (Column B):

0    x
1    y
2    z
Name: B, dtype: object

s2 (Row 0):

A      1
B      x
C    3.4
Name: 0, dtype: object

We can see that pandas cannot determine the difference between a string column of dtype object and a column containing mixed types of dtype object by using a method like Series.str.upper:

s1.str.upper()

0    X
1    Y
2    Z
Name: B, dtype: object
s2.str.upper()

A    NaN
B      X
C    NaN
Name: 0, dtype: object

Notice that (in pandas 1.3.5) the numeric (int and float) values have been turned into NaN. This does not raise any Error or Warning.

(This also works for the provided sample Series)

pd.Series([1, 2, 3, 1.5, 'US', True, False, 'India']).str.upper()

0      NaN
1      NaN
2      NaN
3      NaN
4       US
5      NaN
6      NaN
7    INDIA
dtype: object

Again, this is due to the fact that pandas sees this Series as a homogeneous collection of objects, which allows for the use of the dtype restricted .str methods.