How to provide additional initialization for a subclass of namedtuple?
Suppose I have a namedtuple
like this:
EdgeBase = namedtuple("EdgeBase", "left, right")
I want to implement a custom hash-function for this, so I create the following subclass:
class Edge(EdgeBase):
def __hash__(self):
return hash(self.left) * hash(self.right)
Since the object is immutable, I want the hash-value to be calculated only once, so I do this:
class Edge(EdgeBase):
def __init__(self, left, right):
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
This appears to be working, but I am really not sure about subclassing and initialization in Python, especially with tuples. Are there any pitfalls to this solution? Is there a recommended way how to do this? Is it fine? Thanks in advance.
edit for 2017: turns out namedtuple
isn't a great idea. attrs is the modern alternative.
class Edge(EdgeBase):
def __new__(cls, left, right):
self = super(Edge, cls).__new__(cls, left, right)
self._hash = hash(self.left) * hash(self.right)
return self
def __hash__(self):
return self._hash
__new__
is what you want to call here because tuples are immutable. Immutable objects are created in __new__
and then returned to the user, instead of being populated with data in __init__
.
cls
has to be passed twice to the super
call on __new__
because __new__
is, for historical/odd reasons implicitly a staticmethod
.
In Python 3.7+, you can now use dataclasses to build hashable classes with ease.
Code
Assuming int
types of left
and right
, we use the default hashing via unsafe_hash
+ keyword:
import dataclasses as dc
@dc.dataclass(unsafe_hash=True)
class Edge:
left: int
right: int
hash(Edge(1, 2))
# 3713081631934410656
Now we can use these (mutable) hashable objects as elements in a set or (keys in a dict).
{Edge(1, 2), Edge(1, 2), Edge(2, 1), Edge(2, 3)}
# {Edge(left=1, right=2), Edge(left=2, right=1), Edge(left=2, right=3)}
Details
We can alternatively override the __hash__
function:
@dc.dataclass
class Edge:
left: int
right: int
def __post_init__(self):
# Add custom hashing function here
self._hash = hash((self.left, self.right)) # emulates default
def __hash__(self):
return self._hash
hash(Edge(1, 2))
# 3713081631934410656
Expanding on @ShadowRanger's comment, the OP's custom hash function is not reliable. In particular, the attribute values can be interchanged, e.g. hash(Edge(1, 2)) == hash(Edge(2, 1))
, which is likely unintended.
+Note, the name "unsafe" suggests the default hash will be used despite being a mutable object. This may be undesired, particularly within a dict expecting immutable keys. Immutable hashing can be turned on with the appropriate keywords. See also more on hashing logic in dataclasses and a related issue.
The code in the question could benefit from a super call in the __init__
in case it ever gets subclassed in a multiple inheritance situation, but otherwise is correct.
class Edge(EdgeBase):
def __init__(self, left, right):
super(Edge, self).__init__(left, right)
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
While tuples are readonly only the tuple parts of their subclasses are readonly, other properties may be written as usual which is what allows the assignment to _hash regardless of whether it's done in __init__
or __new__
. You can make the subclass fully readonly by setting it's __slots__
to (), which has the added benefit of saving memory, but then you wouldn't be able to assign to _hash.