sorted() using generator expressions rather than lists
Solution 1:
The first thing sorted()
does is to convert the data to a list. Basically the first line (after argument validation) of the implementation is
newlist = PySequence_List(seq);
See also the full source code version 2.7 and version 3.1.2.
Edit: As pointed out in the answer by aaronasterling, the variable newlist
is, well, a new list. If the parameter is already a list, it is copied. So a generator expression really has the advantage of using less memory.
Solution 2:
The easiest way to see which is faster is to use timeit
and it tells me that it's faster to pass a list rather than a generator:
>>> import random
>>> randomlist = range(1000)
>>> random.shuffle(randomlist)
>>> import timeit
>>> timeit.timeit("sorted(x for x in randomlist)",setup = "from __main__ import randomlist",number = 10000)
4.944492386602178
>>> timeit.timeit("sorted([x for x in randomlist])",setup = "from __main__ import randomlist",number = 10000)
4.635165083830486
And:
>>> timeit.timeit("sorted(x for x in xrange(1000,1,-1))",number = 10000)
1.411807087213674
>>> timeit.timeit("sorted([x for x in xrange(1000,1,-1)])",number = 10000)
1.0734657617099401
I think this is because when sorted()
converts the incoming value to a list it can do this more quickly for something that is already a list than for a generator. The source code seems to confirm this (but this is from reading the comments rather than fully understanding everything that is going on).
Solution 3:
There's no way to sort a sequence without knowing all the elements of the sequence, so any generator passed to sorted()
is exhausted.
Solution 4:
There's a huge benefit. Because sorted doesn't affect the passed in sequence, it has to make a copy of it. If it's making a list from the generator expression, then only one list gets made. If a list comprehension is passed in, then first, that gets built and then sorted
makes a copy of it to sort.
This is reflected in the line
newlist = PySequence_List(seq);
quoted in Sven Marnach's answer. Essentially, this will unconditionally make a copy of whatever sequence is passed to it.