Nested list comprehension scope

The best way to explain my question is with an example:

example.py:

class A(object):
    integers = [1, 2, 3]
    singles = [i for i in integers]

class B(object):
    integers = [1, 2, 3]
    pairs = [(i, j) for i in integers for j in integers]

When I run this under python 2 it works fine, but under python 3 I get a NameError for class B (but not class A):

$ python example.py
Traceback (most recent call last):
  File "example.py", line 6, in <module>
    class B(object):
  File "example.py", line 8, in B
    pairs = [(i, j) for i in integers for j in integers]
  File "example.py", line 8, in <listcomp>
    pairs = [(i, j) for i in integers for j in integers]
NameError: global name 'integers' is not defined

Why does only class B raise a NameError and why only under Python 3?


Class scopes are a bit strange in Python 3, but it's for a good reason.

In Python 2, the iteration variables (i and j in your examples) leaked out of list comprehensions and would be included in the outside scope. This is because they were developed early in Python 2's design, and they were based on explicit loops. As an example of how this is unexpected, check the values of B.i and B.j in Python 2 where you didn't get an error!

In Python 3, list comprehensions were changed to prevent this leaking. They are now implemented with a function (which has its own scope) that is called to produce the list value. This makes them work the same as generator expressions, which have always been functions under the covers.

A consequence of this is that in a class, a list comprehension usually can't see any class variables. This is parallel to a method not being able to see class variables directly (only though self or the explicit class name). For example, calling the method in the class below will give the same NameError exception you are seeing in your list comprehension:

class Foo:
    classvar = "bar"
    def blah(self):
        print(classvar) # raises "NameError: global name 'classvar' is not defined"

There is an exception, however: The sequence being iterated over by the first for clause of a list comprehension is evaluated outside of the inner function. This is why your A class works in Python 3. It does this so that generators can catch non-iterable objects immediately (rather than only when next is called on them and their code runs).

But it doesn't work for the inner for clause in the two-level comprehension in class B.

You can see the difference if you disassemble some functions that create list comprehensions using the dis module:

def f(lst):
    return [i for i in lst]

def g(lst):
    return [(i, j) for i in lst for j in lst]

Here's the disassembly of f:

>>> dis.dis(f)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x0000000003CCA1E0, file "<pyshell#374>", line 2>) 
              3 LOAD_CONST               2 ('f.<locals>.<listcomp>') 
              6 MAKE_FUNCTION            0 
              9 LOAD_FAST                0 (lst) 
             12 GET_ITER             
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             16 RETURN_VALUE       

The first three lines show f loading up a precompiled code block and creating a function out of it (it names it f.<locals>.<listcomp>). This is the function used to make the list.

The next two lines show the lst variable being loaded and an iterator being made from it. This is happening within f's scope, not the inner function's. Then the <listcomp> function is called with that iterator as its argument.

This is comparable to class A. It gets the iterator from the class variable integers, just like you can use other kinds of references to previous class members in the definition of a new member.

Now, compare the disassembly of g, which makes pairs by iterating over the same list twice:

>>> dis.dis(g)
  2           0 LOAD_CLOSURE             0 (lst) 
              3 BUILD_TUPLE              1 
              6 LOAD_CONST               1 (<code object <listcomp> at 0x0000000003CCA810, file "<pyshell#377>", line 2>) 
              9 LOAD_CONST               2 ('g.<locals>.<listcomp>') 
             12 MAKE_CLOSURE             0 
             15 LOAD_DEREF               0 (lst) 
             18 GET_ITER             
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair) 
             22 RETURN_VALUE         

This time, it builds a closure with the code object, rather than a basic function. A closure is a function with some "free" variables that refer to things in the enclosing scope. For the <listcomp> function in g, this works just fine, since its scope is a normal one. However, when you try to use the same sort of comprehension in class B, the closure fails, since classes don't let functions they contain see into their scopes in that way (as demonstrated with the Foo class above).

It's worth noting that not only inner sequence values cause this issue. As in the previous question linked to by BrenBarn in a comment, you'll have the same issue if a class variable is referred to elsewhere in the list comprehension:

class C:
    num = 5
    products = [i * num for i in range(10)] # raises a NameError about num

You don't, however, get an error from multi-level list comprehensions where the inner for (or if) clauses only refer to the results of the preceding loops. This is because those values aren't part of a closure, just local variables inside the <listcomp> function's scope.

class D:
    nested = [[1, 2, 3], [4, 5, 6]]
    flattened = [item for inner in nested for item in inner] # works!

Like I said, class scopes are a bit strange.