How can I read large text files in Python, line by line, without loading it into memory?
I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.