When does python choose to intern a string [duplicate]
>>> s1 = "spam"
>>> s2 = "spam"
>>> s1 is s2
True
>>> q = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> r = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> q is r
False
How many characters should have to s1 is s2
give False
? Where is limit? I.e., I am asking how long a string has to be before python starts making separate copies of it.
String interning is implementation specific and shouldn't be relied upon, use equality testing if you want to check two strings are identical.
If you want, for some bizarre reason, to force the comparison to be true then use the intern function:
>>> a = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp') >>> b = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp') >>> a is b True
Here is a piece of comment about interned string from CPython 2.5.0 source file (stringobject.h)
/* ... ... This is generally restricted to strings that **"look like" Python identifiers**, although the intern() builtin can be used to force interning of any string ... ... */
Accordingly, strings contain only underscores, digits or alphabets will be interned. In your example, q
and ``r contain ;
, so they will not be interned.