Comparing pandas map and merge
map
will be faster than a merge
If your goal is simply to assign a numerical category to each unique value in df['AB'], you could use pandas.factorize
that should be a bit faster than map
:
res = df['AB'].factorize()[0]+1
output: array([1, 1, 1, 2, 2, 3, 3, 3])
test on 800k rows:
factorize 28.6 ms ± 153 µs
map 32.1 ms ± 110 µs
merge 68.6 ms ± 1.33 ms