How do I create a hash function that will return True if any of the fields match?

I have this class. The goal of this class is to encapsulate the logic for custom set operations:

class Reach(object):

    def __init__(self, device_id=None, user_id=None):
        self.device_id = device_id
        self.user_id = user_id

    def __eq__(self, other):
        if isinstance(other, Reach):
            # print(self.device_id, self.user_id)
            # print(other.device_id, other.user_id)
            return self.device_id == other.device_id or \
                    self.user_id == other.user_id
        return False

    def __hash__(self):
        return hash(f"device_id={self.device_id}&user_id={self.user_id}")

This is my test case:

if __name__ == "__main__":

    # Basic equality test
    p1 = Reach(device_id="did1")
    p2 = Reach(device_id="did1", user_id="tauid1")
    assert p1 == p2
    s1 = set([p1])
    s2 = set([p2])
    assert len(s1 & s2) == 1                # Failing here 

    # Transmutivity test
    p1 = Reach(device_id="did1")
    p2 = Reach(device_id="did1", user_id="tauid1")
    p3 = Reach(user_id="tauid1")
    assert p1 != p3                                   
    assert p1 == p2 == p3
    s1, s2, s3 = set([p1]), set([p2]), set([p3])
    assert len(s1 & s3) == 0
    assert len(s1 | s3) == 2
    assert len(s1 & s2) == 1
    assert len(s2 & s3) == 1
    assert len(s1 & s2 & s3) == 1
    assert len(s1 | s2 | s3) == 1

I realized that since python set operations uses hash, assert len(s1 & s2) == 1 fails because it's comparing the hash of device_id=did1&user_id=None with device_id=did1&user_id=tauid1 which is not the same

What do I need to change to achieve my desired effect?


Solution 1:

This is easily accomplished, but the result is pretty much useless. Suppose we have four strings a, b, x, and y.

You want Reach(a, b) to have the same hash as Reach(a, y), because the objects compare equal via your __eq__, and for a hash to be of any use at all object equality has to imply hash equality.

But for the same reason, you also need the hashes of Reach(a, y) and Reach(x, y) to be equal.

So for any four strings, you need

    hash(Reach(a, b)) == hash(Reach(x, y))

The only way to achieve that is with a hash function that's a constant, ignoring its argument. Like

def __hash__(self):
    return 12345

That will "work", but set operations will degenerate into extra-slow versions of linear search (all objects will map to the same hash bucket, and the same collision chain).