Is there a clever way to pass the key to defaultdict's default_factory?

A class has a constructor which takes one parameter:

class C(object):
    def __init__(self, v):
        self.v = v
        ...

Somewhere in the code, it is useful for values in a dict to know their keys.
I want to use a defaultdict with the key passed to newborn default values:

d = defaultdict(lambda : C(here_i_wish_the_key_to_be))

Any suggestions?


Solution 1:

It hardly qualifies as clever - but subclassing is your friend:

class keydefaultdict(defaultdict):
    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError( key )
        else:
            ret = self[key] = self.default_factory(key)
            return ret

d = keydefaultdict(C)
d[x] # returns C(x)

Solution 2:

No, there is not.

The defaultdict implementation can not be configured to pass missing key to the default_factory out-of-the-box. Your only option is to implement your own defaultdict subclass, as suggested by @JochenRitzel, above.

But that isn't "clever" or nearly as clean as a standard library solution would be (if it existed). Thus the answer to your succinct, yes/no question is clearly "No".

It's too bad the standard library is missing such a frequently needed tool.

Solution 3:

I don't think you need defaultdict here at all. Why not just use dict.setdefault method?

>>> d = {}
>>> d.setdefault('p', C('p')).v
'p'

That will of course would create many instances of C. In case it's an issue, I think the simpler approach will do:

>>> d = {}
>>> if 'e' not in d: d['e'] = C('e')

It would be quicker than the defaultdict or any other alternative as far as I can see.

ETA regarding the speed of in test vs. using try-except clause:

>>> def g():
    d = {}
    if 'a' in d:
        return d['a']


>>> timeit.timeit(g)
0.19638929363557622
>>> def f():
    d = {}
    try:
        return d['a']
    except KeyError:
        return


>>> timeit.timeit(f)
0.6167065411074759
>>> def k():
    d = {'a': 2}
    if 'a' in d:
        return d['a']


>>> timeit.timeit(k)
0.30074866358404506
>>> def p():
    d = {'a': 2}
    try:
        return d['a']
    except KeyError:
        return


>>> timeit.timeit(p)
0.28588609450770264

Solution 4:

Here's a working example of a dictionary that automatically adds a value. The demonstration task in finding duplicate files in /usr/include. Note customizing dictionary PathDict only requires four lines:

class FullPaths:

    def __init__(self,filename):
        self.filename = filename
        self.paths = set()

    def record_path(self,path):
        self.paths.add(path)

class PathDict(dict):

    def __missing__(self, key):
        ret = self[key] = FullPaths(key)
        return ret

if __name__ == "__main__":
    pathdict = PathDict()
    for root, _, files in os.walk('/usr/include'):
        for f in files:
            path = os.path.join(root,f)
            pathdict[f].record_path(path)
    for fullpath in pathdict.values():
        if len(fullpath.paths) > 1:
            print("{} located in {}".format(fullpath.filename,','.join(fullpath.paths)))