Python string interning
Solution 1:
This is implementation-specific, but your interpreter is probably interning compile-time constants but not the results of run-time expressions.
In what follows CPython 3.9.0+ is used.
In the second example, the expression "strin"+"g"
is evaluated at compile time, and is replaced with "string"
. This makes the first two examples behave the same.
If we examine the bytecodes, we'll see that they are exactly the same:
# s1 = "string"
1 0 LOAD_CONST 0 ('string')
2 STORE_NAME 0 (s1)
# s2 = "strin" + "g"
2 4 LOAD_CONST 0 ('string')
6 STORE_NAME 1 (s2)
This bytecode was obtained with (which prints a few more lines after the above):
import dis
source = 's1 = "string"\ns2 = "strin" + "g"'
code = compile(source, '', 'exec')
print(dis.dis(code))
The third example involves a run-time concatenation, the result of which is not automatically interned:
# s3a = "strin"
3 8 LOAD_CONST 1 ('strin')
10 STORE_NAME 2 (s3a)
# s3 = s3a + "g"
4 12 LOAD_NAME 2 (s3a)
14 LOAD_CONST 2 ('g')
16 BINARY_ADD
18 STORE_NAME 3 (s3)
20 LOAD_CONST 3 (None)
22 RETURN_VALUE
This bytecode was obtained with (which prints a few more lines before the above, and those lines are exactly as in the first block of bytecodes given above):
import dis
source = (
's1 = "string"\n'
's2 = "strin" + "g"\n'
's3a = "strin"\n'
's3 = s3a + "g"')
code = compile(source, '', 'exec')
print(dis.dis(code))
If you were to manually sys.intern()
the result of the third expression, you'd get the same object as before:
>>> import sys
>>> s3a = "strin"
>>> s3 = s3a + "g"
>>> s3 is "string"
False
>>> sys.intern(s3) is "string"
True
Also, Python 3.9 prints a warning for the last two statements above:
SyntaxWarning: "is" with a literal. Did you mean "=="?
Solution 2:
Case 1
>>> x = "123"
>>> y = "123"
>>> x == y
True
>>> x is y
True
>>> id(x)
50986112
>>> id(y)
50986112
Case 2
>>> x = "12"
>>> y = "123"
>>> x = x + "3"
>>> x is y
False
>>> x == y
True
Now, your question is why the id is same in case 1 and not in case 2.
In case 1, you have assigned a string literal "123"
to x
and y
.
Since string are immutable, it makes sense for the interpreter to store the string literal only once and point all the variables to the same object.
Hence you see the id as identical.
In case 2, you are modifying x
using concatenation. Both x
and y
has same values, but not same identity.
Both points to different objects in memory. Hence they have different id
and is
operator returned False