python equivalent of filter() getting two output lists (i.e. partition of a list)
Let's say I have a list, and a filtering function. Using something like
>>> filter(lambda x: x > 10, [1,4,12,7,42])
[12, 42]
I can get the elements matching the criterion. Is there a function I could use that would output two lists, one of elements matching, one of the remaining elements? I could call the filter()
function twice, but that's kinda ugly :)
Edit: the order of elements should be conserved, and I may have identical elements multiple times.
Solution 1:
Try this:
def partition(pred, iterable):
trues = []
falses = []
for item in iterable:
if pred(item):
trues.append(item)
else:
falses.append(item)
return trues, falses
Usage:
>>> trues, falses = partition(lambda x: x > 10, [1,4,12,7,42])
>>> trues
[12, 42]
>>> falses
[1, 4, 7]
There is also an implementation suggestion in itertools recipes:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
The recipe comes from the Python 3.x documentation. In Python 2.x filterfalse
is called ifilterfalse
.
Solution 2:
>>> def partition(l, p):
... return reduce(lambda x, y: (x[0]+[y], x[1]) if p(y) else (x[0], x[1]+[y]), l, ([], []))
...
>>> partition([1, 2, 3, 4, 5], lambda x: x < 3)
([1, 2], [3, 4, 5])
and a little uglier but faster version of the above code:
def partition(l, p):
return reduce(lambda x, y: x[0].append(y) or x if p(y) else x[1].append(y) or x, l, ([], []))
This is second edit, but I think it matters:
def partition(l, p):
return reduce(lambda x, y: x[not p(y)].append(y) or x, l, ([], []))
The second and the third are as quick as the iterative one upper but are less code.
Solution 3:
TL;DR
The accepted, most voted answer [1] by Mark Byers
def partition(pred, iterable): trues = [] falses = [] for item in iterable: if pred(item): trues.append(item) else: falses.append(item) return trues, falses
is the simplest and the fastest.
Benchmarking the different approaches
The different approaches that had been suggested can be classified broadly in three categories,
- straightforward list manipulation via
lis.append
, returning a 2-tuple of lists, -
lis.append
mediated by a functional approach, returning a 2-tuple of lists, - using the canonical recipe given in the
itertools
fine documentation, returning a 2-tuple of, loosely speaking, generators.
Here follows a vanilla implementation of the three techniques, first
the functional approach, then itertools
and eventually two different
implementations of direct list manipulation, the alternative being
using the False
is zero, True
is one trick.
Note that this is Python3 — hence reduce
comes from functools
—
and that OP request a tuple like (positives, negatives)
but my
implementations all return (negatives, positives)
…
$ ipython
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import functools
...:
...: def partition_fu(p, l, r=functools.reduce):
...: return r(lambda x, y: x[p(y)].append(y) or x, l, ([], []))
...:
In [2]: import itertools
...:
...: def partition_it(pred, iterable,
...: filterfalse=itertools.filterfalse,
...: tee=itertools.tee):
...: t1, t2 = tee(iterable)
...: return filterfalse(pred, t1), filter(pred, t2)
...:
In [3]: def partition_li(p, l):
...: a, b = [], []
...: for n in l:
...: if p(n):
...: b.append(n)
...: else:
...: a.append(n)
...: return a, b
...:
In [4]: def partition_li_alt(p, l):
...: x = [], []
...: for n in l: x[p(n)].append(n)
...: return x
...:
We need a predicate to apply to our lists and lists (again, loosely speaking) on which to operate.
In [5]: p = lambda n:n%2
In [6]: five, ten = range(50000), range(100000)
To overcome the problem in testing the itertools
approach, that was
reported by joeln on
Oct 31 '13 at 6:17
Nonsense. You've calculated the time taken to construct the generators in
filterfalse
andfilter
, but you've not iterated through the input or calledpred
once! The advantage of theitertools
recipe is that it does not materialise any list, or look further ahead in the input than necessary. It callspred
twice as often and takes almost twice as long as Byers et al.
I have thought of a void loop that just instantiates all the couples of elements in the two iterables returned by the different partition functions.
First we use two fixed lists to have an idea of the
overload implied (using the very convenient IPython's magic %timeit
)
In [7]: %timeit for e, o in zip(five, five): pass
4.21 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Next we use the different implementations, one after the other
In [8]: %timeit for e, o in zip(*partition_fu(p, ten)): pass
53.9 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]: %timeit for e, o in zip(*partition_it(p, ten)): pass
44.5 ms ± 3.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [10]: %timeit for e, o in zip(*partition_li(p, ten)): pass
36.3 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [11]: %timeit for e, o in zip(*partition_li_alt(p, ten)): pass
37.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]:
Comments
The plainest of the approaches is also the fastest one.
Using the x[p(n)]
trick is, ehm, useless because at every step you
have to index a data structure, giving you a slight penalty — it's
however nice to know if you want to persuade a survivor of a declining
culture at pythonizing.
The functional approach, that is operatively equivalent to the
alternative append
implementation, is ~50% slower, possibly due to
the fact that we have an extra (w/r to predicate evaluation) function
call for each list element.
The itertools
approach has the (customary) advantages that ❶ no
potentially large list is instantiated and ❷ the input list is not
entirely processed if you break out of the consumer loop, but when we
use it it is slower because of the need to apply the predicate on both
ends of the tee
Aside
I've fallen in love with the object.mutate() or object
idiom that
was exposed by Marii
in their answer showing
a functional approach to the problem — I'm afraid that, sooner or later,
I'm going to abuse it.
Footnotes
[1] Accepted and most voted as today, Sep 14 2017 — but of course I have the highest hopes for this answer of mine!
Solution 4:
I think groupby might be more relevant here:
http://docs.python.org/library/itertools.html#itertools.groupby
For example, splitting a list into odd and even numbers (or could be an arbitrary number of groups):
>>> l=range(6)
>>> key=lambda x: x % 2 == 0
>>> from itertools import groupby
>>> {k:list(g) for k,g in groupby(sorted(l,key=key),key=key)}
{False: [1, 3, 5], True: [0, 2, 4]}