Custom sorting in pandas dataframe
Solution 1:
Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:
First make the month column a categorical and specify the ordering to use.
In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])
In [22]: df # looks the same!
Out[22]:
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April
Now, when you sort the month column it will sort with respect to that list:
In [23]: df.sort_values("m")
Out[23]:
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
Note: if a value is not in the list it will be converted to NaN.
An older answer for those interested...
You could create an intermediary series, and set_index
on that:
df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()
In [4]: df.set_index(s.index).sort()
Out[4]:
a b m
0 1 2 March
1 3 4 April
2 5 6 Dec
As commented, in newer pandas, Series has a replace
method to do this more elegantly:
s = df['m'].replace({'March':0, 'April':1, 'Dec':3})
The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).
Solution 2:
pandas >= 1.1
You will soon be able to use sort_values
with key
argument:
pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'
custom_dict = {'March': 0, 'April': 1, 'Dec': 3}
df
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April
df.sort_values(by=['m'], key=lambda x: x.map(custom_dict))
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
The key
argument takes as input a Series and returns a Series. This series is internally argsorted and the sorted indices are used to reorder the input DataFrame. If there are multiple columns to sort on, the key function will be applied to each one in turn. See Sorting with keys.
pandas <= 1.0.X
One simple method is using the output Series.map
and Series.argsort
to index into df
using DataFrame.iloc
(since argsort produces sorted integer positions); since you have a dictionary; this becomes easy.
df.iloc[df['m'].map(custom_dict).argsort()]
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
If you need to sort in descending order, invert the mapping.
df.iloc[(-df['m'].map(custom_dict)).argsort()]
a b m
1 5 6 Dec
2 3 4 April
0 1 2 March
Note that this only works on numeric items. Otherwise, you will need to workaround this using sort_values
, and accessing the index:
df.loc[df['m'].map(custom_dict).sort_values(ascending=False).index]
a b m
1 5 6 Dec
2 3 4 April
0 1 2 March
More options are available with astype
(this is deprecated now), or pd.Categorical
, but you need to specify ordered=True
for it to work correctly.
# Older version,
# df['m'].astype('category',
# categories=sorted(custom_dict, key=custom_dict.get),
# ordered=True)
df['m'] = pd.Categorical(df['m'],
categories=sorted(custom_dict, key=custom_dict.get),
ordered=True)
Now, a simple sort_values
call will do the trick:
df.sort_values('m')
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
The categorical ordering will also be honoured when groupby
sorts the output.