Advanced Nested List Comprehension Syntax
I was playing around with list comprehensions to get a better understanding of them and I ran into some unexpected output that I am not able to explain. I haven't found this question asked before, but if it /is/ a repeat question, I apologize.
I was essentially trying to write a generator which generated generators. A simple generator that uses list comprehension would look like this:
(x for x in range(10) if x%2==0) # generates all even integers in range(10)
What I was trying to do was write a generator that generated two generators - the first of which generated the even numbers in range(10) and the second of which generated the odd numbers in range(10). For this, I did:
>>> (x for x in range(10) if x%2==i for i in range(2))
<generator object <genexpr> at 0x7f6b90948f00>
>>> for i in g.next(): print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment
>>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> g = (x for x in range(10) if x%2==i for i in range(2))
>>> g
<generator object <genexpr> at 0x7f6b90969730>
>>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment
I don't understand why 'i' is being referenced before assignment
I thought it might have had something to do with i in range(2)
, so I did:
>>> g = (x for x in range(10) if x%2==i for i in [0.1])
>>> g
<generator object <genexpr> at 0x7f6b90948f00>
>>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment
This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:
>>> [x for x in range(10) if x%2==i for i in range(2)]
[1, 1, 3, 3, 5, 5, 7, 7, 9, 9]
which I expected to be the same as:
>>> l = []
>>> for i in range(2):
... for x in range(10):
... if x%2==i:
... l.append(x)
...
>>> l
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9] # so where is my list comprehension malformed?
But when I tried it on a hunch, this worked:
>>> [[x for x in range(10) if x%2==i] for i in range(2)]
[[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]] # so nested lists in nested list comprehension somehow affect the scope of if statements? :S
So I thought it might be a problem with what level of scope the if
statement operates in. So I tried this:
>>> [x for x in range(10) for i in range(2) if x%2==i]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And now I'm thoroughly confused. Can someone please explain this behavior. I don't understand why my list comprehensions seem to be malformed, nor do I understand how the scoping of the if
statements work.
PS: While proof-reading the question, I realized that this does look a bit like a homework question - it is not.
Solution 1:
you need to use some parentheses:
((x for x in range(10) if x%2==i) for i in range(2))
This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:
[>>> [x for x in range(10) if x%2==i for i in range(2)] [1, 1, 3, 3, 5, 5, 7, 7, 9, 9]
That worked because a previous list comprehension leaks the i variable to the enclosing scope, and become the i for the current one. Try starting a fresh python interpreter, and that would fail due to NameError. The counter's leaking behavior has been removed in Python 3.
EDIT:
The equivalent for loop for:
(x for x in range(10) if x%2==i for i in range(2))
would be:
l = []
for x in range(10):
if x%2 == i:
for i in range(2):
l.append(x)
which also gives a name error.
EDIT2:
the parenthesed version:
((x for x in range(10) if x%2==i) for i in range(2))
is equivalent to:
li = []
for i in range(2):
lx = []
for x in range(10):
if x%2==i:
lx.append(x)
li.append(lx)
Solution 2:
Lie Ryan's for-loop equivalent leads me to the following, which does seem to work just fine:
[x for i in range(2) for x in range(10) if i == x%2]
outputs
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9]
Solution 3:
Expanding on Lie Ryan's answer a bit:
something = (x for x in range(10) if x%2==i for i in range(2))
is equivalent to:
def _gen1():
for x in range(10):
if x%2 == i:
for i in range(2):
yield x
something = _gen1()
whereas the parenthesised version is equivalent to:
def _gen1():
def _gen2():
for x in range(10):
if x%2 == i:
yield x
for i in range(2):
yield _gen2()
something = _gen1()
This does actually yield the two generators:
[<generator object <genexpr> at 0x02A0A968>, <generator object <genexpr> at 0x02A0A990>]
Unfortunately the generators it yields are somewhat unstable as the output will depend on how you consume them:
>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in gens:
print(list(g))
[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]
>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in list(gens):
print(list(g))
[1, 3, 5, 7, 9]
[1, 3, 5, 7, 9]
My advice is to write the generator functions out in full: I think trying to get the correct scoping on i
without doing that may be all but impossible.
Solution 4:
Lie has the answer to the syntactical question. A suggestion: don't stuff so much into the body of a generator. A function is much more readable.
def make_generator(modulus):
return (x for x in range(10) if x % 2 == modulus)
g = (make_generator(i) for i in range(2))