How do I read two lines from a file at a time using python
I am coding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines and for convenience I would like to read both lines before parsing. Can this be done in Python?
I would like to some something like:
f = open(filename, "r")
for line in f:
line1 = line
line2 = f.readline()
f.close
But this breaks saying that:
ValueError: Mixing iteration and read methods would lose data
Related:
- What is the most “pythonic” way to iterate over a list in chunks?
Similar question here. You can't mix iteration and readline so you need to use one or the other.
while True:
line1 = f.readline()
line2 = f.readline()
if not line2: break # EOF
...
import itertools
with open('a') as f:
for line1,line2 in itertools.zip_longest(*[f]*2):
print(line1,line2)
itertools.zip_longest()
returns an iterator, so it'll work well even if the file is billions of lines long.
If there are an odd number of lines, then line2
is set to None
on the last iteration.
On Python2 you need to use izip_longest
instead.
In the comments, it has been asked if this solution reads the whole file first, and then iterates over the file a second time.
I believe that it does not. The with open('a') as f
line opens a file handle, but does not read the file. f
is an iterator, so its contents are not read until requested. zip_longest
takes iterators as arguments, and returns an iterator.
zip_longest
is indeed fed the same iterator, f, twice. But what ends up happening is that next(f)
is called on the first argument and then on the second argument. Since next()
is being called on the same underlying iterator, successive lines are yielded. This is very different than reading in the whole file. Indeed the purpose of using iterators is precisely to avoid reading in the whole file.
I therefore believe the solution works as desired -- the file is only read once by the for-loop.
To corroborate this, I ran the zip_longest solution versus a solution using f.readlines()
. I put a input()
at the end to pause the scripts, and ran ps axuw
on each:
% ps axuw | grep zip_longest_method.py
unutbu 11119 2.2 0.2
4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile
% ps axuw | grep readlines_method.py
unutbu 11317 6.5 8.8
93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile
The readlines
clearly reads in the whole file at once. Since the zip_longest_method
uses much less memory, I think it is safe to conclude it is not reading in the whole file at once.
use next()
, eg
with open("file") as f:
for line in f:
print(line)
nextline = next(f)
print("next line", nextline)
....
I would proceed in a similar way as ghostdog74, only with the try outside and a few modifications:
try:
with open(filename) as f:
for line1 in f:
line2 = f.next()
# process line1 and line2 here
except StopIteration:
print "(End)" # do whatever you need to do with line1 alone
This keeps the code simple and yet robust. Using the with
closes the file if something else happens, or just closes the resources once you have exhausted it and exit the loop.
Note that with
needs 2.6, or 2.5 with the with_statement
feature enabled.