Pandas DENSE RANK
I'm dealing with pandas dataframe and have a frame like this:
Year Value
2012 10
2013 20
2013 25
2014 30
I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:
Year Value Rank
2012 10 1
2013 20 2
2013 25 2
2014 30 3
How can it be done in pandas?
Thanks!
Solution 1:
Use pd.Series.rank
with method='dense'
df['Rank'] = df.Year.rank(method='dense').astype(int)
df
Solution 2:
The fastest solution is factorize
:
df['Rank'] = pd.factorize(df.Year)[0] + 1
Timings:
#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)
In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop
In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop
In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 µs per loop