Python Pandas : Drop Duplicates Function - Unusual Behaviour

drop_duplicates hashes the objects to keep track of which ones have been seen or not, efficiently.

lists are not hashable (as they are mutable), thus you can't use drop_duplicates on them directly. When you save and load the data, chances are that it is converted to string, which enables the hash to be calculated.

To overcome the issue, you can convert the lists to tuples, that are hashable:

df['col1'] = df['col1'].apply(tuple)
# now this runs with no error
df.drop_duplicates(subset=['col1', 'col2', 'col3'], keep='last', inplace=True)

Because even though both columns are dtype objects, the items in them are different types:

>>> df.loc[0,'col1']
[1]


>>> df_.loc[0, 'col1']
'[1]'

Since strings are hashable, you don't see the error that you had before with lists.

How to center ggplot label on multiple geom_pointranges?

Rails 7: How can I remove Turbo completely?

Peepcode-git.pdf: Best strategy to keep a long standing feature branch in sync with main branch with rebase

Is it possible to use Pydantic instead of dataclasses in Structured Configs in hydra-core python package?

Access accuracy in keras / tensorflow while learning

Using downgradeModule in conjunction with downgradeInjectable in an angular / angularjs hybrid application results in error

How to convert DLU into pixels?

How to refresh JWT token using Apollo and GraphQL

How to configure gradle to output total number of tests executed?

NesteJS with TypeORM - hooks and listeners not working

displaying None instead of data in the form of table

Splitting string based on variable number of white spaces

Python Pandas : Drop Duplicates Function - Unusual Behaviour

Related

Recent Posts