How does Python's cmp_to_key function work?
I came across this function here.
I am baffled as to how this would be implemented -- how does the key
function generated by cmp_to_key
know what "position" a given element should be without checking how the given element compares with every other element of interest?
Solution 1:
The cmp_to_key
method returns a special object that acts as a surrogate key:
class K(object):
__slots__ = ['obj']
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
def __hash__(self):
raise TypeError('hash not implemented')
When sorting, each key will get compared to most other keys in the sequence. Is this element at position 0 lower than or greater than that other object?
Whenever that happens, the special method hooks are invoked, so __lt__
or __gt__
is called, which the surrogate key turns into a call to the cmp
method instead.
So the list [1, 2, 3]
is sorted as [K(1), K(2), K(3)]
, and if, say, K(1)
is compared with K(2)
to see if K(1)
is lower, then K(1).__lt__(K(2))
is called, which is translated to mycmp(1, 2) < 0
.
This is how the old cmp
method was working anyway; return -1, 0 or 1 depending on wether the first argument is lower than, equal to or greater than the second argument. The surrogate key translates those numbers back to boolean values for the comparison operators.
At no point does the surrogate key need to know anything about absolute positions. It only needs to know about one other object it is being compared with, and the special method hooks provide that other object.