why is my column index not sorted properly?
I want to sort such a dataframe(df) and I want to sort the columns by index on ascending order so I type in:
df_sorted=df.sort_index(axis=1,level=int)
however it returns me the order like this... isn't the column should be ordered in int like 1,2,3,....115?
1 10 100 101 102 103 104 ... 94 95 96 97 98 99 Unnamed: 0
0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 2
2 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 3
3 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 4
4 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 5
.. .. .. ... ... ... ... ... ... .. .. .. .. .. .. ...
111 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 110
112 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 111
113 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 112
114 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 113
115 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 114
Solution 1:
Check your index type, it should be object
df.columns.dtype
dtype('O')
We need to convert to numeric
out = df.iloc[:,pd.to_numeric(df.columns,errors='coerce').argsort()]
Or
df.columns = df.columns.astype(object)
df = df.sort_index(axis=1)
With another package
import natsort as ns
out = df.iloc[:,ns.index_natsorted(df.columns)]
Solution 2:
You can use natsort
.
Install with:
python -m pip install natsort
import pandas as pd
from natsort import natsorted
data= {
'10': [1,2],
'111': [4,4],
'100': [2,3],
'1000': [6,6],
'102': [0,5],
}
df = pd.DataFrame(data)
print(df.reindex(natsorted(df.columns), axis=1))
Output:
10 100 102 111 1000
0 1 2 0 4 6
1 2 3 5 4 6