How to sort two lists (which reference each other) in the exact same way
One classic approach to this problem is to use the "decorate, sort, undecorate" idiom, which is especially simple using python's built-in zip
function:
>>> list1 = [3,2,4,1, 1]
>>> list2 = ['three', 'two', 'four', 'one', 'one2']
>>> list1, list2 = zip(*sorted(zip(list1, list2)))
>>> list1
(1, 1, 2, 3, 4)
>>> list2
('one', 'one2', 'two', 'three', 'four')
These of course are no longer lists, but that's easily remedied, if it matters:
>>> list1, list2 = (list(t) for t in zip(*sorted(zip(list1, list2))))
>>> list1
[1, 1, 2, 3, 4]
>>> list2
['one', 'one2', 'two', 'three', 'four']
It's worth noting that the above may sacrifice speed for terseness; the in-place version, which takes up 3 lines, is a tad faster on my machine for small lists:
>>> %timeit zip(*sorted(zip(list1, list2)))
100000 loops, best of 3: 3.3 us per loop
>>> %timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100000 loops, best of 3: 2.84 us per loop
On the other hand, for larger lists, the one-line version could be faster:
>>> %timeit zip(*sorted(zip(list1, list2)))
100 loops, best of 3: 8.09 ms per loop
>>> %timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100 loops, best of 3: 8.51 ms per loop
As Quantum7 points out, JSF's suggestion is a bit faster still, but it will probably only ever be a little bit faster, because Python uses the very same DSU idiom internally for all key-based sorts. It's just happening a little closer to the bare metal. (This shows just how well optimized the zip
routines are!)
I think the zip
-based approach is more flexible and is a little more readable, so I prefer it.
Note that when elements of list1
are equal, this approach will end up comparing elements of list2
. If elements of list2
don't support comparison, or don't produce a boolean when compared (for example, if list2
is a list of NumPy arrays), this will fail, and if elements of list2
are very expensive to compare, it might be better to avoid comparison anyway.
In that case, you can sort indices as suggested in jfs's answer, or you can give the sort a key function that avoids comparing elements of list2
:
result1, result2 = zip(*sorted(zip(list1, list2), key=lambda x: x[0]))
Also, the use of zip(*...)
as a transpose fails when the input is empty. If your inputs might be empty, you will have to handle that case separately.
You can sort indexes using values as keys:
indexes = range(len(list1))
indexes.sort(key=list1.__getitem__)
To get sorted lists given sorted indexes:
sorted_list1 = map(list1.__getitem__, indexes)
sorted_list2 = map(list2.__getitem__, indexes)
In your case you shouldn't have list1
, list2
but rather a single list of pairs:
data = [(3, 'three'), (2, 'two'), (4, 'four'), (1, 'one'), (1, 'one2')]
It is easy to create; it is easy to sort in Python:
data.sort() # sort using a pair as a key
Sort by the first value only:
data.sort(key=lambda pair: pair[0])
I have used the answer given by senderle for a long time until I discovered np.argsort
.
Here is how it works.
# idx works on np.array and not lists.
list1 = np.array([3,2,4,1])
list2 = np.array(["three","two","four","one"])
idx = np.argsort(list1)
list1 = np.array(list1)[idx]
list2 = np.array(list2)[idx]
I find this solution more intuitive, and it works really well. The perfomance:
def sorting(l1, l2):
# l1 and l2 has to be numpy arrays
idx = np.argsort(l1)
return l1[idx], l2[idx]
# list1 and list2 are np.arrays here...
%timeit sorting(list1, list2)
100000 loops, best of 3: 3.53 us per loop
# This works best when the lists are NOT np.array
%timeit zip(*sorted(zip(list1, list2)))
100000 loops, best of 3: 2.41 us per loop
# 0.01us better for np.array (I think this is negligible)
%timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100000 loops, best for 3 loops: 1.96 us per loop
Even though np.argsort
isn't the fastest one, I find it easier to use.
Schwartzian transform. The built-in Python sorting is stable, so the two 1
s don't cause a problem.
>>> l1 = [3, 2, 4, 1, 1]
>>> l2 = ['three', 'two', 'four', 'one', 'second one']
>>> zip(*sorted(zip(l1, l2)))
[(1, 1, 2, 3, 4), ('one', 'second one', 'two', 'three', 'four')]