How to filter dictionary keys based on its corresponding values
I have:
dictionary = {"foo":12, "bar":2, "jim":4, "bob": 17}
I want to iterate over this dictionary, but over the values instead of the keys, so I can use the values in another function.
For example, I want to test which dictionary values are greater than 6
, and then store their keys in a list. My code looks like this:
list = []
for c in dictionary:
if c > 6:
list.append(dictionary[c])
print list
and then, in a perfect world, list
would feature all the keys whose value is greater than 6
.
However, my for
loop is only iterating over the keys; I would like to change that to the values!
Any help is greatly appreciated. thank you
Solution 1:
>>> d = {"foo": 12, "bar": 2, "jim": 4, "bob": 17}
>>> [k for k, v in d.items() if v > 6] # Use d.iteritems() on python 2.x
['bob', 'foo']
I'd like to just update this answer to also showcase the solution by @glarrain which I find myself tending to use nowadays.
[k for k in d if d[k] > 6]
This is completely cross compatible and doesn't require a confusing change from .iteritems
(.iteritems
avoids saving a list to memory on Python 2 which is fixed in Python 3) to .items
.
@Prof.Falken mentioned a solution to this problem
from six import iteritems
which effectively fixes the cross compatibility issues BUT requires you to download the package six
However I would not fully agree with @glarrain that this solution is more readable, that is up for debate and maybe just a personal preference even though Python is supposed to have only 1 way to do it. In my opinion it depends on the situation (eg. you may have a long dictionary name you don't want to type twice or you want to give the values a more readable name or some other reason)
Some interesting timings:
In Python 2, the 2nd solution is faster, in Python 3 they are almost exactly equal in raw speed.
$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.items() if v > 6]'
1000000 loops, best of 3: 0.772 usec per loop
$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.iteritems() if v > 6]'
1000000 loops, best of 3: 0.508 usec per loop
$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k in d if d[k] > 6]'
1000000 loops, best of 3: 0.45 usec per loop
$ python3 -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.items() if v > 6]'
1000000 loops, best of 3: 1.02 usec per loop
$ python3 -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k in d if d[k] > 6]'
1000000 loops, best of 3: 1.02 usec per loop
However these are only tests for small dictionaries, in huge dictionaries I'm pretty sure that not having a dictionary key lookup (d[k]
) would make .items
much faster.
And this seems to be the case
$ python -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k in d if d[k] > 6]'
1 loops, best of 3: 1.75 sec per loop
$ python -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k, v in d.iteritems() if v > 6]'
1 loops, best of 3: 1.71 sec per loop
$ python3 -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k in d if d[k] > 6]'
1 loops, best of 3: 3.08 sec per loop
$ python3 -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k, v in d.items() if v > 6]'
1 loops, best of 3: 2.47 sec per loop
Solution 2:
To just get the values, use dictionary.values()
To get key value pairs, use dictionary.items()