Advanced Nested List Comprehension Syntax

I was playing around with list comprehensions to get a better understanding of them and I ran into some unexpected output that I am not able to explain. I haven't found this question asked before, but if it /is/ a repeat question, I apologize.

I was essentially trying to write a generator which generated generators. A simple generator that uses list comprehension would look like this:

(x for x in range(10) if x%2==0) # generates all even integers in range(10)

What I was trying to do was write a generator that generated two generators - the first of which generated the even numbers in range(10) and the second of which generated the odd numbers in range(10). For this, I did:

>>> (x for x in range(10) if x%2==i for i in range(2))
<generator object <genexpr> at 0x7f6b90948f00>

>>> for i in g.next(): print i
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> g = (x for x in range(10) if x%2==i for i in range(2))
>>> g
<generator object <genexpr> at 0x7f6b90969730>
>>> g.next()
Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 1, in <genexpr>
    UnboundLocalError: local variable 'i' referenced before assignment

I don't understand why 'i' is being referenced before assignment

I thought it might have had something to do with i in range(2), so I did:

>>> g = (x for x in range(10) if x%2==i for i in [0.1])
>>> g
<generator object <genexpr> at 0x7f6b90948f00>
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment

This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:

>>> [x for x in range(10) if x%2==i for i in range(2)]
[1, 1, 3, 3, 5, 5, 7, 7, 9, 9]

which I expected to be the same as:

>>> l = []
>>> for i in range(2):
...     for x in range(10):
...             if x%2==i:
...                     l.append(x)
... 
>>> l
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9] # so where is my list comprehension malformed?

But when I tried it on a hunch, this worked:

>>> [[x for x in range(10) if x%2==i] for i in range(2)]
[[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]] # so nested lists in nested list comprehension somehow affect the scope of if statements? :S

So I thought it might be a problem with what level of scope the if statement operates in. So I tried this:

>>> [x for x in range(10) for i in range(2) if x%2==i]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And now I'm thoroughly confused. Can someone please explain this behavior. I don't understand why my list comprehensions seem to be malformed, nor do I understand how the scoping of the if statements work.

PS: While proof-reading the question, I realized that this does look a bit like a homework question - it is not.


Solution 1:

you need to use some parentheses:

((x for x in range(10) if x%2==i) for i in range(2))

This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:

[>>> [x for x in range(10) if x%2==i for i in range(2)] [1, 1, 3, 3, 5, 5, 7, 7, 9, 9]

That worked because a previous list comprehension leaks the i variable to the enclosing scope, and become the i for the current one. Try starting a fresh python interpreter, and that would fail due to NameError. The counter's leaking behavior has been removed in Python 3.

EDIT:

The equivalent for loop for:

(x for x in range(10) if x%2==i for i in range(2))

would be:

l = []
for x in range(10):
    if x%2 == i:
        for i in range(2):
            l.append(x)

which also gives a name error.

EDIT2:

the parenthesed version:

((x for x in range(10) if x%2==i) for i in range(2))

is equivalent to:

li = []
for i in range(2):
    lx = []
    for x in range(10):
        if x%2==i:
            lx.append(x)
    li.append(lx)

Solution 2:

Lie Ryan's for-loop equivalent leads me to the following, which does seem to work just fine:

[x for i in range(2) for x in range(10) if i == x%2]

outputs

[0, 2, 4, 6, 8, 1, 3, 5, 7, 9]

Solution 3:

Expanding on Lie Ryan's answer a bit:

something = (x for x in range(10) if x%2==i for i in range(2))

is equivalent to:

def _gen1():
    for x in range(10):
        if x%2 == i:
            for i in range(2):
                yield x
something = _gen1()

whereas the parenthesised version is equivalent to:

def _gen1():
    def _gen2():
        for x in range(10):
            if x%2 == i:
                yield x

    for i in range(2):
        yield _gen2()
something = _gen1()

This does actually yield the two generators:

[<generator object <genexpr> at 0x02A0A968>, <generator object <genexpr> at 0x02A0A990>]

Unfortunately the generators it yields are somewhat unstable as the output will depend on how you consume them:

>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in gens:
        print(list(g))

[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]
>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in list(gens):
        print(list(g))

[1, 3, 5, 7, 9]
[1, 3, 5, 7, 9]

My advice is to write the generator functions out in full: I think trying to get the correct scoping on i without doing that may be all but impossible.

Solution 4:

Lie has the answer to the syntactical question. A suggestion: don't stuff so much into the body of a generator. A function is much more readable.

def make_generator(modulus):
    return (x for x in range(10) if x % 2 == modulus)
g = (make_generator(i) for i in range(2))