Move all zeroes to the beginning of a list in Python
I have a list like below:
a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5]
and I want to push all zeroes to the beginning of that list. The result must be like below.
a = [0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
How to do it in Python 2?
You could sort the list:
a.sort(key=lambda v: v != 0)
The key
function tells Python to sort values by wether or not they are 0
. False
is sorted before True
, and values are then sorted based on their original relative position.
For 0
, False
is returned, sorting all those values first. For the rest True
is returned, leaving sort to put them last but leave their relative positions untouched.
Demo:
>>> a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5]
>>> a.sort(key=lambda v: v != 0)
>>> a
[0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
This can be done without sorting.
Solutions
Initialization:
In [8]: a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5]
In [9]: from itertools import compress, repeat, chain
list.count
and itertools.compress
In [10]: x = [0] * a.count(0); x.extend(compress(a, a))
In [11]: x
Out[11]: [0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
Same as before, but without list.count
In [12]: c = list(compress(a, a)); [0] * (len(a) - len(c)) + c
Out[12]: [0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
list.count
, itertools.compress
, itertools.repeat
, itertools.chain
In [13]: list(chain(repeat(0, a.count(0)), compress(a, a)))
Out[13]: [0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
Same as the previous one, but without list.count
In [14]: c = list(compress(a, a)); list(chain(repeat(0, len(a) - len(c)), c))
Out[14]: [0, 0, 0, 0, 4, 5, 6, 7, 1, 5]
Benchmarks
For small lists:
In [21]: %timeit x = [0] * a.count(0); x.extend(compress(a, a))
1000000 loops, best of 3: 583 ns per loop
In [22]: %timeit c = list(compress(a, a)); [0] * (len(a) - len(c)) + c
1000000 loops, best of 3: 661 ns per loop
In [23]: %timeit list(chain(repeat(0, a.count(0)), compress(a, a)))
1000000 loops, best of 3: 762 ns per loop
In [24]: %timeit c = list(compress(a, a)); list(chain(repeat(0, len(a) - len(c)), c))
1000000 loops, best of 3: 900 ns per loop
For large lists:
In [28]: a *= 10000000
In [29]: %timeit x = [0] * a.count(0); x.extend(compress(a, a))
1 loops, best of 3: 1.43 s per loop
In [30]: %timeit c = list(compress(a, a)); [0] * (len(a) - len(c)) + c
1 loops, best of 3: 1.37 s per loop
In [31]: %timeit list(chain(repeat(0, a.count(0)), compress(a, a)))
1 loops, best of 3: 1.79 s per loop
In [32]: %timeit c = list(compress(a, a)); list(chain(repeat(0, len(a) - len(c)), c))
1 loops, best of 3: 1.47 s per loop
As you can see, in some cases itertools
-based solutions tend to be slower, because of the big number of function calls.
Here are some better timings, with two new methods:
SETUP="
from itertools import compress, repeat, chain
a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5]
"
First the sorting:
python -m timeit -s "$SETUP" "a.sort(key=bool)"
# 1000000 loops, best of 3: 1.51 usec per loop
Then frostnational's methods:
python -m timeit -s "$SETUP" "list(chain(repeat(0, a.count(0)), compress(a, a)))"
# 1000000 loops, best of 3: 1.16 usec per loop
python -m timeit -s "$SETUP" "cs = list(compress(a, a)); list(chain(repeat(0, len(a)-len(cs)), cs))"
# 1000000 loops, best of 3: 1.37 usec per loop
Then methods more directly working from lists:
python -m timeit -s "$SETUP" "[0] * a.count(0) + list(filter(bool, a))"
# 1000000 loops, best of 3: 1.04 usec per loop
python -m timeit -s "$SETUP" "nonzero = list(filter(bool, a)); [0] * (len(a)-len(nonzero)) + nonzero"
# 1000000 loops, best of 3: 0.87 usec per loop
And again with larger sized input:
SETUP="
from itertools import compress, repeat, chain
a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5] * 1000
"
Sorting:
python -m timeit -s "$SETUP" "a.sort(key=bool)"
# 1000 loops, best of 3: 1.08 msec per loop
frostnational's:
python -m timeit -s "$SETUP" "list(chain(repeat(0, a.count(0)), compress(a, a)))"
# 1000 loops, best of 3: 333 usec per loop
python -m timeit -s "$SETUP" "cs = list(compress(a, a)); list(chain(repeat(0, len(a)-len(cs)), cs))"
# 1000 loops, best of 3: 206 usec per loop
New:
python -m timeit -s "$SETUP" "[0] * a.count(0) + list(filter(bool, a))"
# 1000 loops, best of 3: 295 usec per loop
python -m timeit -s "$SETUP" "nonzero = list(filter(bool, a)); [0] * (len(a)-len(nonzero)) + nonzero"
# 10000 loops, best of 3: 143 usec per loop
That said, despite being relatively slow, Martijn Pieters' decision to use sorting is actually pretty competitive for reasonably-sized lists and premature optimisation is the root of all evil.
FWIW, here are some timings for very long lists:
SETUP="
from itertools import compress, repeat, chain
a = [4, 5, 0, 0, 6, 7, 0, 1, 0, 5] * 1000000
"
python -m timeit -s "$SETUP" "a.sort(key=bool)"
# 10 loops, best of 3: 1.21 sec per loop
python -m timeit -s "$SETUP" "list(chain(repeat(0, a.count(0)), compress(a, a)))"
# 10 loops, best of 3: 347 msec per loop
python -m timeit -s "$SETUP" "cs = list(compress(a, a)); list(chain(repeat(0, len(a)-len(cs)), cs))"
# 10 loops, best of 3: 226 msec per loop
python -m timeit -s "$SETUP" "[0] * a.count(0) + list(filter(bool, a))"
# 10 loops, best of 3: 310 msec per loop
python -m timeit -s "$SETUP" "nonzero = list(filter(bool, a)); [0] * (len(a)-len(nonzero)) + nonzero"
# 10 loops, best of 3: 153 msec per loop