Pandas: Is Series homogenous?

Is Pandas Series homogenous or heterogeneous?

import pandas as pd
S=pd.Series([1,2,3,1.5,'US',True,False,'India'])
>>> S
0        1
1        2
2        3
3      1.5
4       US
5     True
6    False
7    India
dtype: object
>>> S[3]
1.5
>>> type(S[5])
<class 'bool'>

By definition and documentation Series are homogenous.

Series are defined as:

One-dimensional ndarray with axis labels (including time series).

ndarray are defined as:

An array object represents a multidimensional, homogeneous array of fixed-size items.

(bolding mine in both quotes)

Series of dtype object, however, are tricky. Since virtually everything in python can be considered an object, a huge amount of different types of variables can be referenced by this kind of Series. So while the Series itself is a homogenous collection of references to objects, those individual objects when referenced may have heterogenous subtypes.

Accessing each individual value from the Series will result in their individual types being exposed, however, we only have a guarantee that all of the elements in the Series will be of type object.

We most commonly get Series of dtype object either when working strings, or when pulling rows from fragmented DataFrames.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['x', 'y', 'z'],
    'C': [3.4, 5.6, 7.8]
})

s1 = df['B']
s2 = df.iloc[0, :]

s1 (Column B):

0    x
1    y
2    z
Name: B, dtype: object

s2 (Row 0):

A      1
B      x
C    3.4
Name: 0, dtype: object

We can see that pandas cannot determine the difference between a string column of dtype object and a column containing mixed types of dtype object by using a method like Series.str.upper:

s1.str.upper()

0    X
1    Y
2    Z
Name: B, dtype: object

s2.str.upper()

A    NaN
B      X
C    NaN
Name: 0, dtype: object

Notice that (in pandas 1.3.5) the numeric (int and float) values have been turned into NaN. This does not raise any Error or Warning.

(This also works for the provided sample Series)

pd.Series([1, 2, 3, 1.5, 'US', True, False, 'India']).str.upper()

0      NaN
1      NaN
2      NaN
3      NaN
4       US
5      NaN
6      NaN
7    INDIA
dtype: object

Again, this is due to the fact that pandas sees this Series as a homogeneous collection of objects, which allows for the use of the dtype restricted .str methods.

Python Pandas : Drop Duplicates Function - Unusual Behaviour

How to center ggplot label on multiple geom_pointranges?

Rails 7: How can I remove Turbo completely?

Peepcode-git.pdf: Best strategy to keep a long standing feature branch in sync with main branch with rebase

Is it possible to use Pydantic instead of dataclasses in Structured Configs in hydra-core python package?

Access accuracy in keras / tensorflow while learning

Using downgradeModule in conjunction with downgradeInjectable in an angular / angularjs hybrid application results in error

How to convert DLU into pixels?

How to refresh JWT token using Apollo and GraphQL

How to configure gradle to output total number of tests executed?

NesteJS with TypeORM - hooks and listeners not working

displaying None instead of data in the form of table