Python dictionary keys. "In" complexity
First, key in d.keys()
is guaranteed to give you the same value as key in d
for any dict d
.
And the in
operation on a dict
, or the dict_keys
object you get back from calling keys()
on it (in 3.x), is not O(N), it's O(1).
There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__
on a hash table, just as it's the obvious way to implement __getitem__
.
You may ask where this is guaranteed.
Well, it's not. Mapping Types defines dict
as, basically, a hash table implementation of collections.abc.Mapping
. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?
If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__
and __eq__
that logs calls, or…), or read the source.
In 2.x, you do not want to call keys
, because that generates a list
of the keys, instead of a KeysView
. You could use iterkeys
, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.
Even in 3.x, you don't want to call keys
, because there's no need to. Iterating a dict
, checking its __contains__
, and in general treating it like a sequence is always equivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView
, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)
(It's not quite clear that using sequence operations is equivalent for d.keys()
/d.iterkeys()
and d
in 2.x. Other than performance issues, they are equivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d
.)
While we're at it, note that this:
if(dict[key] != None):
… is not going to work. If the key
is not in the dict
, this will raise KeyError
, not return None
.
Also, you should never check None
with ==
or !=
; always use is
.
You can do this with a try
—or, more simply, do if dict.get(key, None) is not None
. But again, there's no reason to do so. Also, that won't handle cases where None
is a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:
.
So, the right thing to write is:
if key in d:
More generally, this is not true:
I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element
The in
operator, like most other operators, is just a call to a __contains__
method (or the equivalent for a C/Java/.NET/RPython builtin). list
implements it by iterating the list and comparing each element; dict
implements it by hashing the value and looking up the hash; blist.blist
implements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.
In Python 2 dict.keys()
creates the whole list of keys first that's why it is an O(N)
operation, while key in dict
is an O(1)
operation.
if(dict[key] != None)
will raise KeyError
if key is not found in the dict, so it is not equivalent to the first code.
Python 2 results:
>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 170 ns per loop
>>> %timeit 10000 in dic.keys()
100 loops, best of 3: 4.98 ms per loop
>>> %timeit 10000 in dic.iterkeys()
1000 loops, best of 3: 402 us per loop
>>> %timeit 10000 in dic.viewkeys()
1000000 loops, best of 3: 457 ns per loop
In Python 3 dict.keys()
returns a view object which is quite faster than Python 2's keys()
but still slower simple normal key in dict
:
Python 3 results:
>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 295 ns per loop
>>> %timeit 10000 in dic.keys()
1000000 loops, best of 3: 475 ns per loop
Use just:
if key in dict:
#code
The proper way to do this would be
if key in dict:
do stuff
the in operator is O(1) for dictionaries and sets in python.
The in operator for dict has average case time-complexity of O(1). For detailed information about time complexity of other dict() methods, visit this link.