Python string with space and without space at the end and immutability
Solution 1:
This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string'
happens to be automatically interned when 'string '
isn't because 'string'
contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.
From the CPython 2.7 source code, stringobject.h
, line 28:
Interning strings (ob_sstate) tries to ensure that only one string object with a given value exists, so equality tests can be one pointer comparison. This is generally restricted to strings that "look like" Python identifiers, although the intern() builtin can be used to force interning of any string.
You can see the code that does this in Objects/codeobject.c
:
/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}
Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a
and b
assignments together, e.g. by placing them in a module or an if True:
, you would find that a
and b
would be the same string.
Solution 2:
This behavior is not consistent, and as others have mentioned depends on the variant of Python being executed. For a deeper discussion, see this question.
If you want to make sure that the same object is being used you can force the interning of strings by the appropriately named intern
:
intern(...) intern(string) -> string
``Intern'' the given string. This enters the string in the (global) table of interned strings whose purpose is to speed up dictionary lookups. Return the string itself or the previously interned string object with the same value.
>>> a = 'string '
>>> b = 'string '
>>> id(a) == id(b)
False
>>> a = intern('string ')
>>> b = intern('string ')
>>> id(a) == id(b)
True
Note in Python3, you have to explicitly import intern from sys import intern
.