zip iterators asserting for equal length in python
I am looking for a nice way to zip
several iterables raising an exception if the lengths of the iterables are not equal.
In the case where the iterables are lists or have a len
method this solution is clean and easy:
def zip_equal(it1, it2):
if len(it1) != len(it2):
raise ValueError("Lengths of iterables are different")
return zip(it1, it2)
However, if it1
and it2
are generators, the previous function fails because the length is not defined TypeError: object of type 'generator' has no len()
.
I imagine the itertools
module offers a simple way to implement that, but so far I have not been able to find it. I have come up with this home-made solution:
def zip_equal(it1, it2):
exhausted = False
while True:
try:
el1 = next(it1)
if exhausted: # in a previous iteration it2 was exhausted but it1 still has elements
raise ValueError("it1 and it2 have different lengths")
except StopIteration:
exhausted = True
# it2 must be exhausted too.
try:
el2 = next(it2)
# here it2 is not exhausted.
if exhausted: # it1 was exhausted => raise
raise ValueError("it1 and it2 have different lengths")
except StopIteration:
# here it2 is exhausted
if not exhausted:
# but it1 was not exhausted => raise
raise ValueError("it1 and it2 have different lengths")
exhausted = True
if not exhausted:
yield (el1, el2)
else:
return
The solution can be tested with the following code:
it1 = (x for x in ['a', 'b', 'c']) # it1 has length 3
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it1, it2)) # len(it1) < len(it2) => raise
it1 = (x for x in ['a', 'b', 'c']) # it1 has length 3
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it2, it1)) # len(it2) > len(it1) => raise
it1 = (x for x in ['a', 'b', 'c', 'd']) # it1 has length 4
it2 = (x for x in [0, 1, 2, 3]) # it2 has length 4
list(zip_equal(it1, it2)) # like zip (or izip in python2)
Am I overlooking any alternative solution? Is there a simpler implementation of my zip_equal
function?
Update:
- Requiring python 3.10 or newer, see Asocia's answer
- Thorough performance benchmarking and best performing solution on python<3.10: Stefan's answer
- Simple answer without external dependencies: Martijn Pieters' answer (please check the comments for a bugfix in some corner cases)
- More complex than Martijn's, but with better performance: cjerdonek's answer
- If you don't mind a package dependency, see pylang's answer
An optional boolean keyword argument, strict
, is introduced for the built-in zip
function in PEP 618.
Quoting What’s New In Python 3.10:
The zip() function now has an optional
strict
flag, used to require that all the iterables have an equal length.
When enabled, a ValueError
is raised if one of the arguments is exhausted before the others.
>>> list(zip('ab', range(3)))
[('a', 0), ('b', 1)]
>>> list(zip('ab', range(3), strict=True))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: zip() argument 2 is longer than argument 1
I can think of a simpler solution, use itertools.zip_longest()
and raise an exception if the sentinel value used to pad out shorter iterables is present in the tuple produced:
from itertools import zip_longest
def zip_equal(*iterables):
sentinel = object()
for combo in zip_longest(*iterables, fillvalue=sentinel):
if sentinel in combo:
raise ValueError('Iterables have different lengths')
yield combo
Unfortunately, we can't use zip()
with yield from
to avoid a Python-code loop with a test each iteration; once the shortest iterator runs out, zip()
would advance all preceding iterators and thus swallow the evidence if there is but one extra item in those.