sorted() using generator expressions rather than lists

Solution 1:

The first thing sorted() does is to convert the data to a list. Basically the first line (after argument validation) of the implementation is

newlist = PySequence_List(seq);

See also the full source code version 2.7 and version 3.1.2.

Edit: As pointed out in the answer by aaronasterling, the variable newlist is, well, a new list. If the parameter is already a list, it is copied. So a generator expression really has the advantage of using less memory.

Solution 2:

The easiest way to see which is faster is to use timeit and it tells me that it's faster to pass a list rather than a generator:

>>> import random
>>> randomlist = range(1000)
>>> random.shuffle(randomlist)
>>> import timeit
>>> timeit.timeit("sorted(x for x in randomlist)",setup = "from __main__ import randomlist",number = 10000)
4.944492386602178
>>> timeit.timeit("sorted([x for x in randomlist])",setup = "from __main__ import randomlist",number = 10000)
4.635165083830486

And:

>>> timeit.timeit("sorted(x for x in xrange(1000,1,-1))",number = 10000)
1.411807087213674
>>> timeit.timeit("sorted([x for x in xrange(1000,1,-1)])",number = 10000)
1.0734657617099401

I think this is because when sorted() converts the incoming value to a list it can do this more quickly for something that is already a list than for a generator. The source code seems to confirm this (but this is from reading the comments rather than fully understanding everything that is going on).

Solution 3:

There's no way to sort a sequence without knowing all the elements of the sequence, so any generator passed to sorted() is exhausted.

Solution 4:

There's a huge benefit. Because sorted doesn't affect the passed in sequence, it has to make a copy of it. If it's making a list from the generator expression, then only one list gets made. If a list comprehension is passed in, then first, that gets built and then sorted makes a copy of it to sort.

This is reflected in the line

newlist = PySequence_List(seq);

quoted in Sven Marnach's answer. Essentially, this will unconditionally make a copy of whatever sequence is passed to it.