Reading two text files line by line simultaneously
I have two text files in two different languages and they are aligned line by line. I.e. the first line in textfile1 corresponds to the first line in textfile2, and so on and so forth.
Is there a way to read both file line-by-line simultaneously?
Below is a sample of how the files should look like, imagine the number of lines per file is around 1,000,000.
textfile1:
This is a the first line in English
This is a the 2nd line in English
This is a the third line in English
textfile2:
C'est la première ligne en Français
C'est la deuxième ligne en Français
C'est la troisième ligne en Français
desired output
This is a the first line in English\tC'est la première ligne en Français
This is a the 2nd line in English\tC'est la deuxième ligne en Français
This is a the third line in English\tC'est la troisième ligne en Français
There is a Java version of this Read two textfile line by line simultaneously -java, but Python doesn't use bufferedreader that reads line by line. So how would it be done?
Solution 1:
from itertools import izip
with open("textfile1") as textfile1, open("textfile2") as textfile2:
for x, y in izip(textfile1, textfile2):
x = x.strip()
y = y.strip()
print("{0}\t{1}".format(x, y))
In Python 3, replace itertools.izip
with the built-in zip
.
Solution 2:
with open(file1) as f1, open(fil2) as f2:
for x, y in zip(f1, f2):
print("{0}\t{1}".format(x.strip(), y.strip()))
output:
This is a the first line in English C'est la première ligne en Français
This is a the 2nd line in English C'est la deuxième ligne en Français
This is a the third line in English C'est la troisième ligne en Français
Solution 3:
We could use generator
for more convenient file opening, and it could easily support to iterator on more files simultaneously.
filenames = ['textfile1', 'textfile2']
def gen_line(filename):
with open(filename) as f:
for line in f:
yield line.strip()
gens = [gen_line(n) for n in filenames]
for file1_line, file2_line in zip(*gens):
print("\t".join([file1_line, file2_line]))
Note:
- This is
python 3
code. Forpython 2
, useitertools.izip
like other people said. -
zip
would stop after the shortest file is iterated over, useitertools.zip_longest
if it matters.