Using list.remove() in nested for loops [duplicate]
Solution 1:
You can use a list comprehension to create a new list containing only the elements you don't want to remove:
somelist = [x for x in somelist if not determine(x)]
Or, by assigning to the slice somelist[:]
, you can mutate the existing list to contain only the items you want:
somelist[:] = [x for x in somelist if not determine(x)]
This approach could be useful if there are other references to somelist
that need to reflect the changes.
Instead of a comprehension, you could also use itertools
. In Python 2:
from itertools import ifilterfalse
somelist[:] = ifilterfalse(determine, somelist)
Or in Python 3:
from itertools import filterfalse
somelist[:] = filterfalse(determine, somelist)
Solution 2:
The answers suggesting list comprehensions are ALMOST correct -- except that they build a completely new list and then give it the same name the old list as, they do NOT modify the old list in place. That's different from what you'd be doing by selective removal, as in @Lennart's suggestion -- it's faster, but if your list is accessed via multiple references the fact that you're just reseating one of the references and NOT altering the list object itself can lead to subtle, disastrous bugs.
Fortunately, it's extremely easy to get both the speed of list comprehensions AND the required semantics of in-place alteration -- just code:
somelist[:] = [tup for tup in somelist if determine(tup)]
Note the subtle difference with other answers: this one is NOT assigning to a barename - it's assigning to a list slice that just happens to be the entire list, thereby replacing the list contents within the same Python list object, rather than just reseating one reference (from previous list object to new list object) like the other answers.
Solution 3:
You need to take a copy of the list and iterate over it first, or the iteration will fail with what may be unexpected results.
For example (depends on what type of list):
for tup in somelist[:]:
etc....
An example:
>>> somelist = range(10)
>>> for x in somelist:
... somelist.remove(x)
>>> somelist
[1, 3, 5, 7, 9]
>>> somelist = range(10)
>>> for x in somelist[:]:
... somelist.remove(x)
>>> somelist
[]
Solution 4:
for i in range(len(somelist) - 1, -1, -1):
if some_condition(somelist, i):
del somelist[i]
You need to go backwards otherwise it's a bit like sawing off the tree-branch that you are sitting on :-)
Python 2 users: replace range
by xrange
to avoid creating a hardcoded list
Solution 5:
Overview of workarounds
Either:
-
use a linked list implementation/roll your own.
A linked list is the proper data structure to support efficient item removal, and does not force you to make space/time tradeoffs.
A CPython
list
is implemented with dynamic arrays as mentioned here, which is not a good data type to support removals.There doesn't seem to be a linked list in the standard library however:
- Is there a linked list predefined library in Python?
- https://github.com/ajakubek/python-llist
-
start a new
list()
from scratch, and.append()
back at the end as mentioned at: https://stackoverflow.com/a/1207460/895245This time efficient, but less space efficient because it keeps an extra copy of the array around during iteration.
-
use
del
with an index as mentioned at: https://stackoverflow.com/a/1207485/895245This is more space efficient since it dispenses the array copy, but it is less time efficient, because removal from dynamic arrays requires shifting all following items back by one, which is O(N).
Generally, if you are doing it quick and dirty and don't want to add a custom LinkedList
class, you just want to go for the faster .append()
option by default unless memory is a big concern.
Official Python 2 tutorial 4.2. "for Statements"
https://docs.python.org/2/tutorial/controlflow.html#for-statements
This part of the docs makes it clear that:
- you need to make a copy of the iterated list to modify it
- one way to do it is with the slice notation
[:]
If you need to modify the sequence you are iterating over while inside the loop (for example to duplicate selected items), it is recommended that you first make a copy. Iterating over a sequence does not implicitly make a copy. The slice notation makes this especially convenient:
>>> words = ['cat', 'window', 'defenestrate'] >>> for w in words[:]: # Loop over a slice copy of the entire list. ... if len(w) > 6: ... words.insert(0, w) ... >>> words ['defenestrate', 'cat', 'window', 'defenestrate']
Python 2 documentation 7.3. "The for statement"
https://docs.python.org/2/reference/compound_stmts.html#for
This part of the docs says once again that you have to make a copy, and gives an actual removal example:
Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
However, I disagree with this implementation, since .remove()
has to iterate the entire list to find the value.
Could Python do this better?
It seems like this particular Python API could be improved. Compare it, for instance, with:
- Java ListIterator::remove which documents "This call can only be made once per call to next or previous"
- C++
std::vector::erase
which returns a valid interator to the element after the one removed
both of which make it crystal clear that you cannot modify a list being iterated except with the iterator itself, and gives you efficient ways to do so without copying the list.
Perhaps the underlying rationale is that Python lists are assumed to be dynamic array backed, and therefore any type of removal will be time inefficient anyways, while Java has a nicer interface hierarchy with both ArrayList
and LinkedList
implementations of ListIterator
.
There doesn't seem to be an explicit linked list type in the Python stdlib either: Python Linked List