Sort a pandas dataframe series by month name
You can use categorical data to enable proper sorting with pd.Categorical
:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...) # same as you have now; can use inplace=True
When you specify the categories, pandas remembers the order of specification as the default sort order.
Docs: Pandas categories > sorting & order.
You should consider re-indexing it based on axis 0 (indexes)
new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
df1 = df.reindex(new_order, axis=0)
Thanks @Brad Solomon for offering a faster way to capitalize string!
Note 1 @Brad Solomon's answer using pd.categorical
should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P
Alternatively, you can use.
df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
["aug", 11], ["jan", 11], ["jan", 1]],
columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()
# Now the dataset should look like
# Month Price
# -----------
# Dec XX
# Jan XX
# Apr XX
# make it a datetime so that we can sort it:
# use %b because the data use the abbreviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")
total = (df.groupby(df['Month'])['Price'].mean())
# total
Month
1 17.333333
3 11.000000
8 16.000000
12 12.000000
Note 2
groupby
by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY)
and total = (df.groupby(df[SAME_KEY])['Price'].mean()).
Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.
Note 3
A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df
. It will reduce the computational cost if one don't need df
to be sorted.
Note 4 For people already have month
as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex
@jezrael has a working example on making categorical index ordered in Pandas series sort by month index