How to filter dictionary keys based on its corresponding values

I have:

dictionary = {"foo":12, "bar":2, "jim":4, "bob": 17}

I want to iterate over this dictionary, but over the values instead of the keys, so I can use the values in another function.

For example, I want to test which dictionary values are greater than 6, and then store their keys in a list. My code looks like this:

list = []
for c in dictionary:
    if c > 6:
        list.append(dictionary[c])
print list

and then, in a perfect world, list would feature all the keys whose value is greater than 6. However, my for loop is only iterating over the keys; I would like to change that to the values!

Any help is greatly appreciated. thank you


Solution 1:

>>> d = {"foo": 12, "bar": 2, "jim": 4, "bob": 17}
>>> [k for k, v in d.items() if v > 6] # Use d.iteritems() on python 2.x
['bob', 'foo']

I'd like to just update this answer to also showcase the solution by @glarrain which I find myself tending to use nowadays.

[k for k in d if d[k] > 6]

This is completely cross compatible and doesn't require a confusing change from .iteritems (.iteritems avoids saving a list to memory on Python 2 which is fixed in Python 3) to .items.

@Prof.Falken mentioned a solution to this problem

from six import iteritems

which effectively fixes the cross compatibility issues BUT requires you to download the package six

However I would not fully agree with @glarrain that this solution is more readable, that is up for debate and maybe just a personal preference even though Python is supposed to have only 1 way to do it. In my opinion it depends on the situation (eg. you may have a long dictionary name you don't want to type twice or you want to give the values a more readable name or some other reason)

Some interesting timings:

In Python 2, the 2nd solution is faster, in Python 3 they are almost exactly equal in raw speed.


$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.items() if v > 6]'
1000000 loops, best of 3: 0.772 usec per loop
$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.iteritems() if v > 6]'
1000000 loops, best of 3: 0.508 usec per loop
$ python -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k in d if d[k] > 6]'
1000000 loops, best of 3: 0.45 usec per loop

$ python3 -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k, v in d.items() if v > 6]'
1000000 loops, best of 3: 1.02 usec per loop
$ python3 -m timeit -s 'd = {"foo": 12, "bar": 2, "jim": 4, "bob": 17};' '[k for k in d if d[k] > 6]'
1000000 loops, best of 3: 1.02 usec per loop

However these are only tests for small dictionaries, in huge dictionaries I'm pretty sure that not having a dictionary key lookup (d[k]) would make .items much faster. And this seems to be the case

$ python -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k in d if d[k] > 6]'
1 loops, best of 3: 1.75 sec per loop
$ python -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k, v in d.iteritems() if v > 6]'
1 loops, best of 3: 1.71 sec per loop
$ python3 -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k in d if d[k] > 6]'
1 loops, best of 3: 3.08 sec per loop
$ python3 -m timeit -s 'd = {i: i for i in range(-10000000, 10000000)};' -n 1 '[k for k, v in d.items() if v > 6]'
1 loops, best of 3: 2.47 sec per loop

Solution 2:

To just get the values, use dictionary.values()

To get key value pairs, use dictionary.items()