How can a non-assigned string in Python have an address in memory?

Solution 1:

Python reuses string literals fairly aggressively. The rules by which it does so are implementation-dependent, but CPython uses two that I'm aware of:

  • Strings that contain only characters valid in Python identifiers are interned, which means they are stored in a big table and reused wherever they occur. So, no matter where you use "cat", it always refers to the same string object.
  • String literals in the same code block are reused regardless of their content and length. If you put a string literal of the entire Gettysburg Address in a function, twice, it's the same string object both times. In separate functions, they are different objects: def foo(): return "pack my box with five dozen liquor jugs" def bar(): return "pack my box with five dozen liquor jugs" assert foo() is bar() # AssertionError

Both optimizations are done at compile time (that is, when the bytecode is generated).

On the other hand, something like chr(99) + chr(97) + chr(116) is a string expression that evaluates to the string "cat". In a dynamic language like Python, its value can't be known at compile time (chr() is a built-in function, but you might have reassigned it) so it normally isn't interned. Thus its id() is different from that of "cat". However, you can force a string to be interned using the intern() function. Thus:

id(intern(chr(99) + chr(97) + chr(116))) == id("cat")   # True

As others have mentioned, interning is possible because strings are immutable. It isn't possible to change "cat" to "dog", in other words. You have to generate a new string object, which means that there's no danger that other names pointing to the same string will be affected.

Just as an aside, Python also converts expressions containing only constants (like "c" + "a" + "t") to constants at compile time, as the below disassembly shows. These will be optimized to point to identical string objects per the rules above.

>>> def foo(): "c" + "a" + "t"
...
>>> from dis import dis; dis(foo)
  1           0 LOAD_CONST               5 ('cat')
              3 POP_TOP
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE

Solution 2:

'cat' has an address because you create it in order to pass it to id(). You haven't yet bound it to a name, but the object still exists.

Python caches and reuses short strings. But if you assemble strings by concatenation, then the code that searches the cache and attempts re-use is bypassed.

Note that the inner workings of the string cache is pure implementation detail and should not be relied upon.

Solution 3:

All values must reside somewhere in memory. This is why id('cat') produces a value. You call it a "non-existent" string, but it clearly does exist, it just hasn't been assigned to a name yet.

Strings are immutable, so the interpreter can do clever things like make all instances of the literal 'cat' be the same object, so that id(a) and id(b) are the same.

Operating on strings will produce new strings. These may or may not be the same strings as previous strings with the same content.

Note that all of these details are implementations details of CPython, and they can change at any time. You don't need to be concerned with these issues in actual programs.