Unwanted behaviour from dict.fromkeys
I'd like to initialise a dictionary of sets (in Python 2.6) using dict.fromkeys
, but the resulting structure behaves strangely. More specifically:
>>>> x = {}.fromkeys(range(10), set([]))
>>>> x
{0: set([]), 1: set([]), 2: set([]), 3: set([]), 4: set([]), 5: set([]), 6: set([]), 7: set([]), 8: set([]), 9: set([])}
>>>> x[5].add(3)
>>>> x
{0: set([3]), 1: set([3]), 2: set([3]), 3: set([3]), 4: set([3]), 5: set([3]), 6: set([3]), 7: set([3]), 8: set([3]), 9: set([3])}
I obviously don't want to add 3 to all sets, only to the set that corresponds to x[5]
. Of course, I can avoid the problem by initialising x
without fromkeys
, but I'd like to understand what I'm missing here.
The second argument to dict.fromkeys
is just a value. You've created a dictionary that has the same set as the value for every key. Presumably you understand the way this works:
>>> a = set()
>>> b = a
>>> b.add(1)
>>> b
set([1])
>>> a
set([1])
you're seeing the same behavior there; in your case, x[0]
, x[1]
, x[2]
(etc) are all different ways to access the exact same set
object.
This is a bit easier to see with objects whose string representation includes their memory address, where you can see that they're identical:
>>> dict.fromkeys(range(2), object())
{0: <object object at 0x1001da080>,
1: <object object at 0x1001da080>}
You can do this with a generator expression:
x = dict( (i,set()) for i in range(10) )
In Python 3, you can use a dictionary comprehension:
x = { i : set() for i in range(10) }
In both cases, the expression set()
is evaluated for each element, instead of being evaluated once and copied to each element.
Because of this from the dictobject.c
:
while (_PyDict_Next(seq, &pos, &key, &oldvalue, &hash))
{
Py_INCREF(key);
Py_INCREF(value);
if (insertdict(mp, key, hash, value))
return NULL;
}
The value
is your "set([])", it is evaluated only once then their result object reference count is incremented and added to the dictionary, it doesn't evaluates it every time it adds into the dict.