Unwanted behaviour from dict.fromkeys

I'd like to initialise a dictionary of sets (in Python 2.6) using dict.fromkeys, but the resulting structure behaves strangely. More specifically:

>>>> x = {}.fromkeys(range(10), set([]))
>>>> x
{0: set([]), 1: set([]), 2: set([]), 3: set([]), 4: set([]), 5: set([]), 6: set([]), 7: set([]), 8: set([]), 9: set([])}
>>>> x[5].add(3)
>>>> x
{0: set([3]), 1: set([3]), 2: set([3]), 3: set([3]), 4: set([3]), 5: set([3]), 6: set([3]), 7: set([3]), 8: set([3]), 9: set([3])}

I obviously don't want to add 3 to all sets, only to the set that corresponds to x[5]. Of course, I can avoid the problem by initialising x without fromkeys, but I'd like to understand what I'm missing here.


The second argument to dict.fromkeys is just a value. You've created a dictionary that has the same set as the value for every key. Presumably you understand the way this works:

>>> a = set()
>>> b = a
>>> b.add(1)
>>> b
set([1])
>>> a
set([1])

you're seeing the same behavior there; in your case, x[0], x[1], x[2] (etc) are all different ways to access the exact same set object.

This is a bit easier to see with objects whose string representation includes their memory address, where you can see that they're identical:

>>> dict.fromkeys(range(2), object())
{0: <object object at 0x1001da080>,
 1: <object object at 0x1001da080>}

You can do this with a generator expression:

x = dict( (i,set()) for i in range(10) )

In Python 3, you can use a dictionary comprehension:

x = { i : set() for i in range(10) }

In both cases, the expression set() is evaluated for each element, instead of being evaluated once and copied to each element.


Because of this from the dictobject.c:

while (_PyDict_Next(seq, &pos, &key, &oldvalue, &hash))
{
            Py_INCREF(key);
            Py_INCREF(value);
            if (insertdict(mp, key, hash, value))
                return NULL;
}

The value is your "set([])", it is evaluated only once then their result object reference count is incremented and added to the dictionary, it doesn't evaluates it every time it adds into the dict.