How do I create a hash function that will return True if any of the fields match?
I have this class. The goal of this class is to encapsulate the logic for custom set operations:
class Reach(object):
def __init__(self, device_id=None, user_id=None):
self.device_id = device_id
self.user_id = user_id
def __eq__(self, other):
if isinstance(other, Reach):
# print(self.device_id, self.user_id)
# print(other.device_id, other.user_id)
return self.device_id == other.device_id or \
self.user_id == other.user_id
return False
def __hash__(self):
return hash(f"device_id={self.device_id}&user_id={self.user_id}")
This is my test case:
if __name__ == "__main__":
# Basic equality test
p1 = Reach(device_id="did1")
p2 = Reach(device_id="did1", user_id="tauid1")
assert p1 == p2
s1 = set([p1])
s2 = set([p2])
assert len(s1 & s2) == 1 # Failing here
# Transmutivity test
p1 = Reach(device_id="did1")
p2 = Reach(device_id="did1", user_id="tauid1")
p3 = Reach(user_id="tauid1")
assert p1 != p3
assert p1 == p2 == p3
s1, s2, s3 = set([p1]), set([p2]), set([p3])
assert len(s1 & s3) == 0
assert len(s1 | s3) == 2
assert len(s1 & s2) == 1
assert len(s2 & s3) == 1
assert len(s1 & s2 & s3) == 1
assert len(s1 | s2 | s3) == 1
I realized that since python set operations uses hash, assert len(s1 & s2) == 1
fails because it's comparing the hash of device_id=did1&user_id=None
with device_id=did1&user_id=tauid1
which is not the same
What do I need to change to achieve my desired effect?
Solution 1:
This is easily accomplished, but the result is pretty much useless. Suppose we have four strings a
, b
, x
, and y
.
You want Reach(a, b)
to have the same hash as Reach(a, y)
, because the objects compare equal via your __eq__
, and for a hash to be of any use at all object equality has to imply hash equality.
But for the same reason, you also need the hashes of Reach(a, y)
and Reach(x, y)
to be equal.
So for any four strings, you need
hash(Reach(a, b)) == hash(Reach(x, y))
The only way to achieve that is with a hash function that's a constant, ignoring its argument. Like
def __hash__(self):
return 12345
That will "work", but set operations will degenerate into extra-slow versions of linear search (all objects will map to the same hash bucket, and the same collision chain).