Are Python variables pointers? Or else, what are they?
Variables in Python are just pointers, as far as I know.
Based on this rule, I can assume that the result for this code snippet:
i = 5
j = i
j = 3
print(i)
would be 3
.
But I got an unexpected result for me, and it was 5
.
Moreover, my Python book does cover this example:
i = [1,2,3]
j = i
i[0] = 5
print(j)
The result would be [5,2,3]
.
What am I understanding wrong?
We call them references. They work like this
i = 5 # create int(5) instance, bind it to i
j = i # bind j to the same int as i
j = 3 # create int(3) instance, bind it to j
print i # i still bound to the int(5), j bound to the int(3)
Small ints are interned, but that isn't important to this explanation
i = [1,2,3] # create the list instance, and bind it to i
j = i # bind j to the same list as i
i[0] = 5 # change the first item of i
print j # j is still bound to the same list as i
Variables are not pointers. When you assign to a variable you are binding the name to an object. From that point onwards you can refer to the object by using the name, until that name is rebound.
In your first example the name i
is bound to the value 5
. Binding different values to the name j
does not have any effect on i
, so when you later print the value of i
the value is still 5
.
In your second example you bind both i
and j
to the same list object. When you modify the contents of the list, you can see the change regardless of which name you use to refer to the list.
Note that it would be incorrect if you said "both lists have changed". There is only one list but it has two names (i
and j
) that refer to it.
Related documentation
- Execution Model - Naming and Binding
Python variables are names bound to objects
From the docs:
Names refer to objects. Names are introduced by name binding operations. Each occurrence of a name in the program text refers to the binding of that name established in the innermost function block containing the use.
When you do
i = 5
j = i
that's the same as doing:
i = 5
j = 5
j
doesn't point to i
, and after the assignment, j
doesn't know that i
exists. j
is simply bound to whatever i
was pointing to at the time of assignment.
If you did the assignments on the same line, it would look like this:
i = j = 5
And the result would be exactly the same.
Thus, later doing
i = 3
doesn't change what j
is pointing to - and you can swap it - j = 3
would not change what i
is pointing to.
Your example doesn't remove the reference to the list
So when you do this:
i = [1,2,3]
j = i
It's the same as doing this:
i = j = [1,2,3]
so i
and j
both point to the same list. Then your example mutates the list:
i[0] = 5
Python lists are mutable objects, so when you change the list from one reference, and you look at it from another reference, you'll see the same result because it's the same list.
TLDR: Python names work like pointers with automatic de/referencing but do not allow explicit pointer operations. Other targets represent indirections, which behave similar to pointers.
The Python language spec does not define what names and such actually are, only how they behave. However, the behaviour can be explained with pointers.
The CPython implementation uses pointers of type PyObject*
under the hood. As such, it is possible to translate name semantics to pointer operations. The key is to separate names from actual objects.
The example Python code includes both names (i
) and objects (5
).
i = 5 # name `i` refers to object `5`
j = i # ???
j = 3 # name `j` refers to object `3`
This can be roughly translated to C code with separate names and objects.
int three=3, five=5; // objects
int *i, *j; // names
i = &five; // name `i` refers to position of object `5`
j = i; // name `j` refers to referent of `i`
j = &three; // name `j` refers to position of object `3`
The important part is that "names-as-pointers" do not store objects! We did not define *i = five
, but i = &five
. The names and objects exist independent from each other.
Names only point to existing objects in memory.
When assigning from name to name, no objects are exchanged! When we define j = i
, this is equivalent to j = &five
. Neither i
nor j
are connected to the other.
+- name i -+ -\
\
--> + <five> -+
/ | 5 |
+- name j -+ -/ +----------+
As a result, changing the target of one name does not affect the other. It only updates what that specific name points to.
Python also has other kinds of name-like elements: attribute references (i.j
), subscriptions (i[j]
) and slicing (i[:j]
). Unlike names, which refer directly to objects, all three indirectly refer to elements of objects.
The example code includes both names (i
) and a subscription (i[0]
).
i = [1,2,3] # name `i` refers to object `[1, 2, 3]`
j = i # name `j` refers to referent of `i`
i[0] = 5 # ???
A CPython list
uses a C array of PyObject*
pointers under the hood. This can again be roughly translated to C code with separate names and objects.
typedef struct{
int *elements[3];
} list; // length 3 `list` type
int one = 1, two = 2, three = 3, five = 5;
list values = {&one, &two, &three}; // objects
list *i, *j; // names
i = &values; // name `i` refers to object `[1, 2, 3]`
j = i; // name `j` refers to referent of `i`
i->elements[0] = &five; // leading element of `i` refers to object `5`
The important part is that we did not change any names! We did change i->elements[0]
, the element of an object both our names point to.
Values of existing compound objects may be changed.
When changing the value of an object through a name, names are not changed. Both i
and j
still refer to the same object, whose value we can change.
+- name i -+ -\
\
--> + <values> -+
/ | elements | --> [1, 2, 3]
+- name j -+ -/ +-----------+
The intermediate object behaves similar to a pointer in that we can directly change what it points to and reference it from multiple names.