Natural sorting
I have some files that need to be sorted by name, unfortunately I can't use a regular sort, because I also want to sort the numbers in the string, so I did some research and found that what I am looking for is called natural sorting.
I tried the solution given here and it worked perfectly.
However, for strings like PresserInc-1_10.jpg
and PresserInc-1_11.jpg
which causes that specific natural key algorithm to fail, because it only matches the first integer which in this case would be 1
and 1
, and so it throws off the sorting. So what I think might help is to match all numbers in the string and group them together, so if I have PresserInc-1_11.jpg
the algorithm should give me 111
back, so my question is, is this possible ?
Here's a list of filenames:
files = ['PresserInc-1.jpg', 'PresserInc-1_10.jpg', 'PresserInc-1_11.jpg', 'PresserInc-10.jpg', 'PresserInc-2.jpg', 'PresserInc-3.jpg', 'PresserInc-4.jpg', 'PresserInc-5.jpg', 'PresserInc-6.jpg', 'PresserInc-11.jpg']
Google: Python natural sorting.
Result 1: The page you linked to.
But don't stop there!
Result 2: Jeff Atwood's blog that explains how to do it properly.
Result 3: An answer I posted based on Jeff Atwood's blog.
Here's the code from that answer:
import re
def natural_sort(l):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(l, key=alphanum_key)
Results for your data:
PresserInc-1.jpg PresserInc-1_10.jpg PresserInc-1_11.jpg PresserInc-2.jpg PresserInc-3.jpg etc...
See it working online: ideone
If you don't mind third party libraries, you can use natsort to achieve this.
>>> import natsort
>>> files = ['PresserInc-1.jpg', 'PresserInc-1_10.jpg', 'PresserInc-1_11.jpg', 'PresserInc-10.jpg', 'PresserInc-2.jpg', 'PresserInc-3.jpg', 'PresserInc-4.jpg', 'PresserInc-5.jpg', 'PresserInc-6.jpg', 'PresserInc-11.jpg']
>>> natsort.natsorted(files)
['PresserInc-1.jpg',
'PresserInc-1_10.jpg',
'PresserInc-1_11.jpg',
'PresserInc-2.jpg',
'PresserInc-3.jpg',
'PresserInc-4.jpg',
'PresserInc-5.jpg',
'PresserInc-6.jpg',
'PresserInc-10.jpg',
'PresserInc-11.jpg']