tl;dr Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?

Background: I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().

Problem: id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

The questions:

  1. When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.

  2. Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.

  3. Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.


Solution 1:

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.

This is clearly documented:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.

Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.

To address your specific questions:

  1. In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:

    >>> id(1234)
    4546982768
    >>> id(4321)
    4546982768
    

    The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.

    So it's not random, but in CPython it is a function of the memory allocation algorithms.

  2. If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.

    For example, recording an object reference first, then later checking it:

    import weakref
    
    # record
    object_ref = weakref.ref(some_object)
    
    # check if it's the same object still
    some_other_reference is object_ref()   # only true if they are the same object
    

    The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).

    You could use such a mechanism to generate really unique identifiers, see below.

  3. All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.

    The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.

    Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.

    So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.

If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:

from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4

class UniqueIdMap(WeakKeyDictionary):
    def __init__(self, dict=None):
        super().__init__(self)
        # replace data with a defaultdict to generate uuids
        self.data = defaultdict(uuid4)
        if dict is not None:
            self.update(dict)

uniqueidmap = UniqueIdMap()

def uniqueid(obj):
    """Produce a unique integer id for the object.

    Object must me *hashable*. Id is a UUID and should be unique
    across Python invocations.

    """
    return uniqueidmap[obj].int

This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?

This then gives you unique ids even for objects with non-overlapping lifetimes:

>>> class Foo:
...     pass
...
>>> id(Foo())
4547149104
>>> id(Foo())  # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo())  # but you still get a unique UUID
188632072566395632221804340107821543671

Solution 2:

  1. It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.

  2. If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.

  3. No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

Solution 3:

The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.