Python Pandas : Drop Duplicates Function - Unusual Behaviour

drop_duplicates hashes the objects to keep track of which ones have been seen or not, efficiently.

lists are not hashable (as they are mutable), thus you can't use drop_duplicates on them directly. When you save and load the data, chances are that it is converted to string, which enables the hash to be calculated.

To overcome the issue, you can convert the lists to tuples, that are hashable:

df['col1'] = df['col1'].apply(tuple)
# now this runs with no error
df.drop_duplicates(subset=['col1', 'col2', 'col3'], keep='last', inplace=True)

Because even though both columns are dtype objects, the items in them are different types:

>>> df.loc[0,'col1']
[1]


>>> df_.loc[0, 'col1']
'[1]'

Since strings are hashable, you don't see the error that you had before with lists.