How do I read two lines from a file at a time using python

python

I am coding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines and for convenience I would like to read both lines before parsing. Can this be done in Python?

I would like to some something like:

f = open(filename, "r")
for line in f:
    line1 = line
    line2 = f.readline()

f.close

But this breaks saying that:

ValueError: Mixing iteration and read methods would lose data

What is the most “pythonic” way to iterate over a list in chunks?

Similar question here. You can't mix iteration and readline so you need to use one or the other.

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    ...

import itertools
with open('a') as f:
    for line1,line2 in itertools.zip_longest(*[f]*2):
        print(line1,line2)

itertools.zip_longest() returns an iterator, so it'll work well even if the file is billions of lines long.

If there are an odd number of lines, then line2 is set to None on the last iteration.

On Python2 you need to use izip_longest instead.

In the comments, it has been asked if this solution reads the whole file first, and then iterates over the file a second time. I believe that it does not. The with open('a') as f line opens a file handle, but does not read the file. f is an iterator, so its contents are not read until requested. zip_longest takes iterators as arguments, and returns an iterator.

zip_longest is indeed fed the same iterator, f, twice. But what ends up happening is that next(f) is called on the first argument and then on the second argument. Since next() is being called on the same underlying iterator, successive lines are yielded. This is very different than reading in the whole file. Indeed the purpose of using iterators is precisely to avoid reading in the whole file.

I therefore believe the solution works as desired -- the file is only read once by the for-loop.

To corroborate this, I ran the zip_longest solution versus a solution using f.readlines(). I put a input() at the end to pause the scripts, and ran ps axuw on each:

% ps axuw | grep zip_longest_method.py

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile

% ps axuw | grep readlines_method.py

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile

The readlines clearly reads in the whole file at once. Since the zip_longest_method uses much less memory, I think it is safe to conclude it is not reading in the whole file at once.

use next(), eg

with open("file") as f:
    for line in f:
        print(line)
        nextline = next(f)
        print("next line", nextline)
        ....

I would proceed in a similar way as ghostdog74, only with the try outside and a few modifications:

try:
    with open(filename) as f:
        for line1 in f:
            line2 = f.next()
            # process line1 and line2 here
except StopIteration:
    print "(End)" # do whatever you need to do with line1 alone

This keeps the code simple and yet robust. Using the with closes the file if something else happens, or just closes the resources once you have exhausted it and exit the loop.

Note that with needs 2.6, or 2.5 with the with_statement feature enabled.

How do I read two lines from a file at a time using python

Related:

Related

Recent Posts