How is the 'is' keyword implemented in Python?

... the is keyword that can be used for equality in strings.

>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False

I tried both __is__() and __eq__() but they didn't work.

>>> class MyString:
...   def __init__(self):
...     self.s = 'string'
...   def __is__(self, s):
...     return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
...   def __init__(self):
...     self.s = 'string'
...   def __eq__(self, s):
...     return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>

Solution 1:

Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.

is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.

For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.

Solution 2:

The is operator is equivalent to comparing id(x) values. For example:

>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2)  # equivalent to `s1 is s2`
True

id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.

So, you can't. Unusual in python, but there it is.

Solution 3:

The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).

However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.

Solution 4:

The is keyword compares objects (or, rather, compares if two references are to the same object).

Which is, I think, why there's no mechanism to provide your own implementation.

It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.

>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True

You can hopefully see the reference vs data comparison in a simple 'copy' example:

>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False