What is the fastest way to flatten arbitrarily nested lists in Python? [duplicate]
Solution 1:
Here's a recursive approach that is string friendly:
nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]
def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i
print list(flatten(nests))
returns:
[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']
Note, this doesn't make any guarantees for speed or overhead use, but illustrates a recursive solution that hopefully will be helpful.
Solution 2:
It doesn't have to be recursive. In fact, an iterative solution is often faster because of the overhead involved in function calls. Here's an iterative version I wrote a while back:
def flatten(items, seqtypes=(list, tuple)):
for i, x in enumerate(items):
while i < len(items) and isinstance(items[i], seqtypes):
items[i:i+1] = items[i]
return items
Haven't tested the performance of this specific implementation, but it is probably not so great because of all the slice assignments, which could end up moving a lot of memory around. Still, don't assume it has to be recursive, or that it's simpler to write it that way.
This implementation does have the advantage of flattening the list "in place" rather than returning a copy, as recursive solutions invariably do. This could be useful when memory is tight. If you want a flattened copy, just pass in a shallow copy of the list you want to flatten:
flatten(mylist) # flattens existing list
newlist = flatten(mylist[:]) # makes a flattened copy
Also, this algorithm is not limited by the Python recursion limit because it's not recursive. I'm certain this will virtually never come into play, however.
2021 edit: it occurs to me that the check for the end of the list might be better handled with try
/except
because it will happen only once, and getting the test out of the main loop could provide a performance beneft. That would look like:
def flatten(items, seqtypes=(list, tuple)):
try:
for i, x in enumerate(items):
while isinstance(items[i], seqtypes):
items[i:i+1] = items[i]
except IndexError:
pass
return items
With some further tweaking to use the x
returned by enumerate
instead of accessing items[i]
so much, you get this, which is either mildly or significantly faster than the original version up top, depending on the size and structure of your lists.
def flatten(items, seqtypes=(list, tuple)):
try:
for i, x in enumerate(items):
while isinstance(x, seqtypes):
items[i:i+1] = x
x = items[i]
except IndexError:
pass
return items
Solution 3:
This function should be able to quickly flatten nested, iterable containers without using any recursion:
import collections
def flatten(iterable):
iterator = iter(iterable)
array, stack = collections.deque(), collections.deque()
while True:
try:
value = next(iterator)
except StopIteration:
if not stack:
return tuple(array)
iterator = stack.pop()
else:
if not isinstance(value, str) \
and isinstance(value, collections.Iterable):
stack.append(iterator)
iterator = iter(value)
else:
array.append(value)
After about five years, my opinion on the matter has changed, and this may be even better to use:
def main():
data = [1, 2, [3, 4, [5], []], [6]]
print(list(flatten(data)))
def flatten(iterable):
iterator, sentinel, stack = iter(iterable), object(), []
while True:
value = next(iterator, sentinel)
if value is sentinel:
if not stack:
break
iterator = stack.pop()
elif isinstance(value, str):
yield value
else:
try:
new_iterator = iter(value)
except TypeError:
yield value
else:
stack.append(iterator)
iterator = new_iterator
if __name__ == '__main__':
main()