If two variables point to the same object, why doesn't reassigning one variable affect the other?

I am trying to understand how variables work in python. Say I have an object stored in the variable a:

>>> a = [1, 2, 3]

If I assign a to b, both point to the same object:

>>> b = a
>>> b is a
True

But if I reassign a or b, that's no longer true:

>>> a = {'x': 'y'}
>>> a is b
False

The two variables now have different values:

>>> a
{'x': 'y'}
>>> b
[1, 2, 3]

I don't understand why the variables are different now. Why is a is b no longer true? Can someone explain what's going on?


Solution 1:

Python has names which refer to objects. Objects exist separately from names, and names exist separately from the objects they refer to.

# name a
a = 1337
    # object 1337

When assigning "a name to a name", the right-hand side is evaluated to the referred to object. Similar to how 2 + 2 evaluates to 4, a evaluates to the original 1337.

# name b
b = a
    # object referred to by a -> 1337

At this point, we have a -> 1337 and b -> 1337 - note that neither name knows the other! If we test a is b, both names are evaluated to the same object which is obviously equal.

Reassigning a name only changes what that name refers to - there is no connection by which other names could be changed as well.

# name a - reassign
a = 9001
  # object 9001

At this point, we have a -> 9001 and b -> 1337. If we now test a is b, both names are evaluated to different objects which are not the same.


If you come from languages such as C, then you are used to variables containing values. For example, char a = 12 can be read as "a is a memory region containing 12". On top, you can have several variables use the same memory. Assigning another value to a variable changes the content of the shared memory - and therefore the value of both variables.

+- char a -+
|       12 |
+--char b -+

# a = -128

+- char a -+
|     -128 |
+--char b -+

This is not how Python works: names do not contain anything, but refer to separate values. For example, a = 12 can be read as "a is a name which refers to the value 12". On top, you can have several names refer to the same value - but it will still be separate names, each with its own reference. Assigning another value to a name changes the reference of that name - but leaves the reference of the other name untouched.

+- name a -+ -\
               \
                --> +- <12> ---+
               /    |       12 |
+- name b -+ -/     +----------+

# a = -128
                    +- <-128> -+
+- name a -+ -----> |     -128 |
                    +----------+

                    +- <12> ---+
+- name b -+ -----> |       12 |
                    +----------+

A point of confusion is that mutable objects can appear to violate the separation of names and objects. Commonly, these are containers (e.g list, dict, ...) and classes exhibit the same behaviour by default.

# name m
m = [1337]
    # object [1337]
# name n
n = m
    # object referred to by m

Similar to a plain integer 1337, a list containing an integer [1337] is an object that can be referred to by several, independent names. As above, n is m evaluates to True and m = [9001] does not change n.

However, certain operations on a name change the value seen by the name and all aliases.

# inplace add to m
m += [9001]

After this operation, m == [1337, 9001] and n is m still holds true. In fact, the value seen by n has also changed to [1337, 9001]. This appears to violate above behaviour, in which aliases did not influence each other.

This is because m += [9001] did not change what m refers to. It only change the content of the list that m (and the alias n) referred to. Both m and n still refer to the original list object, whose value was changed.

+- name m -+ -\
               \                  
                --> +- […] -+     +--- <@0> -+
               /    |    @0 |  -> |     1337 |
+- name n -+ -/     +-------+     +----------+

# m += [9001]

+- name m -+ -\
               \                  
                --> +- […] -+     +--- <@0> -++--- <@1> -+
               /    | @0 @1 |  -> |     1337 ||     9001 |
+- name n -+ -/     +-------+     +----------++----------+

Solution 2:

"Say I have an object stored in the variable a" - that is where you are going wrong.

Python objects are not stored in variables, they are referred to by variables.

a = [1, 2, 3]
b = a

a and b refer to the same object. The list object has a reference count of 2, since there are two names referring to it.

a = {'x': 'y'}

a no longer refers to the same list object, instead it now refers to a dict object. That decrements the reference count on the list object, but b still refers to it so the object's reference count is now 1.

b = None

That means that b now refers to the None object (which has a very high reference count, lots of names refer to None). The list object gets its reference count decremented again and it falls to zero. At this point the list object can be garbage collected and the memory freed (when that happens is not guaranteed).

See also sys.getrefcount

Solution 3:

In Python, all variables are stored in dictionaries, or structures which seem a lot like dictionaries (e.g. locals() can expose the current scope/namespace as a dictionary).

Note: PyObject* is a CPython concept. I am not sure how things work in other Python implementations.

So it is flawed to view Python variables like C's where they have precise memory locations. Their values are PyObject* (pointers, or memory locations), not the actual primitive values. Since variables themselves are just entries in a dictionary which point to PyObject* pointers, changing the value of variable is actually giving it a different memory address to point to.

In CPython, it is these PyObject* values which are used by id and is (a is b is the same as id(a) == id(b).)

For example, let's consider the simple line of code:

# x: int
x += 1

Actually changes the memory location associated with the variable. This is because it follows the following logic:

LOAD_FAST (x)
LOAD_CONST (1)
INPLACE_ADD
STORE_FAST (x)

Which is the bytecode which roughly says:

  1. Lookup the value of x. Which is a (in CPython) PyObject* which points to PyLongLong or such (an int from the Python userland)

  2. Load a value from a constant memory address

  3. Add the two values. This will result in a new PyObject* which is also an int
  4. Set the value associated with x to be this new pointer

TL;DR: everything, including primitives, in Python is an object. Variables don't store values per se, but instead the pointers which box them. Reassigning a variable changes the pointer associated with that name, not update the memory held in that location.