What is the most efficient way to search nested lists in python?
I have a list that contains nested lists and I need to know the most efficient way to search within those nested lists.
e.g., if I have
[['a','b','c'],
['d','e','f']]
and I have to search the entire list above, what is the most efficient way to find 'd'?
Solution 1:
>>> lis=[['a','b','c'],['d','e','f']]
>>> any('d' in x for x in lis)
True
generator expression using any
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "any('d' in x for x in lis)"
1000000 loops, best of 3: 1.32 usec per loop
generator expression
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 1.56 usec per loop
list comprehension
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 3.23 usec per loop
How about if the item is near the end, or not present at all? any
is faster than the list comprehension
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"'NOT THERE' in [y for x in lis for y in x]"
100000 loops, best of 3: 4.4 usec per loop
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"any('NOT THERE' in x for x in lis)"
100000 loops, best of 3: 3.06 usec per loop
Perhaps if the list is 1000 times longer? any
is still faster
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"'NOT THERE' in [y for x in lis for y in x]"
100 loops, best of 3: 3.74 msec per loop
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
"any('NOT THERE' in x for x in lis)"
100 loops, best of 3: 2.48 msec per loop
We know that generators take a while to set up, so the best chance for the LC to win is a very short list
$ python -m timeit -s "lis=[['a','b','c']]"
"any('c' in x for x in lis)"
1000000 loops, best of 3: 1.12 usec per loop
$ python -m timeit -s "lis=[['a','b','c']]"
"'c' in [y for x in lis for y in x]"
1000000 loops, best of 3: 0.611 usec per loop
And any
uses less memory too
Solution 2:
Using list comprehension, given:
mylist = [['a','b','c'],['d','e','f']]
'd' in [j for i in mylist for j in i]
yields:
True
and this could also be done with a generator (as shown by @AshwiniChaudhary)
Update based on comment below:
Here is the same list comprehension, but using more descriptive variable names:
'd' in [elem for sublist in mylist for elem in sublist]
The looping constructs in the list comprehension part is equivalent to
for sublist in mylist:
for elem in sublist
and generates a list that where 'd' can be tested against with the in
operator.
Solution 3:
Use a generator expression, here the whole list will not be traversed as generator generate results one by one:
>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
>>> gen = (y for x in lis for y in x)
>>> 'd' in gen
True
>>> list(gen)
['e', 'f']
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.96 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 7.4 usec per loop