In Python, is read() , or readlines() faster?
Solution 1:
For a text file just iterating over it with a for
loop is almost always the way to go. Never mind about speed, it is the cleanest.
In some versions of python readline()
really does just read a single line while the for
loop reads large chunks and splits them up into lines so it may be faster. I think that more recent versions of Python use buffering also for readline()
so the performance difference will be minuscule (for
is probably still microscopically faster because it avoids a method call). However choosing one over the other for performance reasons is probably premature optimisation.
Edit to add: I just checked back through some Python release notes. Python 2.5 said:
It’s now illegal to mix iterating over a file with for line in file and calling the file object’s read()/readline()/readlines() methods.
Python 2.6 introduced TextIOBase which supports both iterating and readline()
simultaneously.
Python 2.7 fixed interleaving read()
and readline()
.
Solution 2:
If file is huge, read() is definitevely bad idea, as it loads (without size parameter), whole file into memory.
Readline reads only one line at time, so I would say that is better choice for huge files.
And just iterating over file object should be as effective as using readline.
See http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects for more info