Compare object instances for equality by their attributes
I have a class MyClass
, which contains two member variables foo
and bar
:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo
and bar
:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False
:
>>> x == y
False
How can I make python consider these two objects equal?
Solution 1:
You should implement the method __eq__
:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__
will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo
and bar
may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__
:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__
and comparing values, is not advisable - it can never be truly general because the __dict__
may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__
instead of __eq__
. Python 2 users may also want to implement __ne__
, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.
Solution 2:
You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id
Solution 3:
If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle
is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str
from serialized a
with the one from b
. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle
may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__
like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__
implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>
. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__
method). For such, the second-to-last row in base_typed
became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj)
. Here, dict_from
is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True
for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted
method beforehand anyway.