Why can I use the same name for iterator and sequence in a Python for loop?

This is more of a conceptual question. I recently saw a piece of code in Python (it worked in 2.7, and it might also have been run in 2.5 as well) in which a for loop used the same name for both the list that was being iterated over and the item in the list, which strikes me as both bad practice and something that should not work at all.

For example:

x = [1,2,3,4,5]
for x in x:
    print x
print x

Yields:

Now, it makes sense to me that the last value printed would be the last value assigned to x from the loop, but I fail to understand why you'd be able to use the same variable name for both your parts of the for loop and have it function as intended. Are they in different scopes? What's going on under the hood that allows something like this to work?

What does dis tell us:

Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dis import dis
>>> dis("""x = [1,2,3,4,5]
... for x in x:
...     print(x)
... print(x)""")

  1           0 LOAD_CONST               0 (1)
              3 LOAD_CONST               1 (2)
              6 LOAD_CONST               2 (3)
              9 LOAD_CONST               3 (4)
             12 LOAD_CONST               4 (5)
             15 BUILD_LIST               5
             18 STORE_NAME               0 (x)

  2          21 SETUP_LOOP              24 (to 48)
             24 LOAD_NAME                0 (x)
             27 GET_ITER
        >>   28 FOR_ITER                16 (to 47)
             31 STORE_NAME               0 (x)

  3          34 LOAD_NAME                1 (print)
             37 LOAD_NAME                0 (x)
             40 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             43 POP_TOP
             44 JUMP_ABSOLUTE           28
        >>   47 POP_BLOCK

  4     >>   48 LOAD_NAME                1 (print)
             51 LOAD_NAME                0 (x)
             54 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             57 POP_TOP
             58 LOAD_CONST               5 (None)
             61 RETURN_VALUE

The key bits are sections 2 and 3 - we load the value out of x (24 LOAD_NAME 0 (x)) and then we get its iterator (27 GET_ITER) and start iterating over it (28 FOR_ITER). Python never goes back to load the iterator again.

Aside: It wouldn't make any sense to do so, since it already has the iterator, and as Abhijit points out in his answer, Section 7.3 of Python's specification actually requires this behavior).

When the name x gets overwritten to point at each value inside of the list formerly known as x Python doesn't have any problems finding the iterator because it never needs to look at the name x again to finish the iteration protocol.

Using your example code as the core reference

x = [1,2,3,4,5]
for x in x:
    print x
print x

I would like you to refer the section 7.3. The for statement in the manual

Excerpt 1

The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the expression_list.

What it means is that your variable x, which is a symbolic name of an object list : [1,2,3,4,5] is evaluated to an iterable object. Even if the variable, the symbolic reference changes its allegiance, as the expression-list is not evaluated again, there is no impact to the iterable object that has already been evaluated and generated.

Note

Everything in Python is an Object, has an Identifier, attributes and methods.
Variables are Symbolic name, a reference to one and only one object at any given instance.
Variables at run-time can change its allegiance i.e. can refer to some other object.

Excerpt 2

The suite is then executed once for each item provided by the iterator, in the order of ascending indices.

Here the suite refers to the iterator and not to the expression-list. So, for each iteration, the iterator is executed to yield the next item instead of referring to the original expression-list.

It is necessary for it to work this way, if you think about it. The expression for the sequence of a for loop could be anything:

binaryfile = open("file", "rb")
for byte in binaryfile.read(5):
    ...

We can't query the sequence on each pass through the loop, or here we'd end up reading from the next batch of 5 bytes the second time. Naturally Python must in some way store the result of the expression privately before the loop begins.

Are they in different scopes?

No. To confirm this you could keep a reference to the original scope dictionary (locals()) and notice that you are in fact using the same variables inside the loop:

x = [1,2,3,4,5]
loc = locals()
for x in x:
    print locals() is loc  # True
    print loc["x"]  # 1
    break

What's going on under the hood that allows something like this to work?

Sean Vieira showed exactly what is going on under the hood, but to describe it in more readable python code, your for loop is essentially equivalent to this while loop:

it = iter(x)
while True:
    try:
        x = it.next()
    except StopIteration:
        break
    print x

This is different from the traditional indexing approach to iteration you would see in older versions of Java, for example:

for (int index = 0; index < x.length; index++) {
    x = x[index];
    ...
 }

This approach would fail when the item variable and the sequence variable are the same, because the sequence x would no longer be available to look up the next index after the first time x was reassigned to the first item.

With the former approach, however, the first line (it = iter(x)) requests an iterator object which is what is actually responsible for providing the next item from then on. The sequence that x originally pointed to no longer needs to be accessed directly.

It's the difference between a variable (x) and the object it points to (the list). When the for loop starts, Python grabs an internal reference to the object pointed to by x. It uses the object and not what x happens to reference at any given time.

If you reassign x, the for loop doesn't change. If x points to a mutable object (e.g., a list) and you change that object (e.g., delete an element) results can be unpredictable.

Basically, the for loop takes in the list x, and then, storing that as a temporary variable, reassigns a x to each value in that temporary variable. Thus, x is now the last value in the list.

>>> x = [1, 2, 3]
>>> [x for x in x]
[1, 2, 3]
>>> x
3
>>>

Just like in this:

>>> def foo(bar):
...     return bar
... 
>>> x = [1, 2, 3]
>>> for x in foo(x):
...     print x
... 
1
2
3
>>>

In this example, x is stored in foo() as bar, so although x is being reassigned, it still exist(ed) in foo() so that we could use it to trigger our for loop.

Why can I use the same name for iterator and sequence in a Python for loop?

Related

Recent Posts