Efficiently finding the last line in a text file [duplicate]
I need to extract the last line from a number of very large (several hundred megabyte) text files to get certain data. Currently, I am using python to cycle through all the lines until the file is empty and then I process the last line returned, but I am certain there is a more efficient way to do this.
What is the best way to retrieve just the last line of a text file using python?
Solution 1:
Not the straight forward way, but probably much faster than a simple Python implementation:
line = subprocess.check_output(['tail', '-1', filename])
Solution 2:
with open('output.txt', 'r') as f:
lines = f.read().splitlines()
last_line = lines[-1]
print last_line
Solution 3:
Use the file's seek
method with a negative offset and whence=os.SEEK_END
to read a block from the end of the file. Search that block for the last line end character(s) and grab all the characters after it. If there is no line end, back up farther and repeat the process.
def last_line(in_file, block_size=1024, ignore_ending_newline=False):
suffix = ""
in_file.seek(0, os.SEEK_END)
in_file_length = in_file.tell()
seek_offset = 0
while(-seek_offset < in_file_length):
# Read from end.
seek_offset -= block_size
if -seek_offset > in_file_length:
# Limit if we ran out of file (can't seek backward from start).
block_size -= -seek_offset - in_file_length
if block_size == 0:
break
seek_offset = -in_file_length
in_file.seek(seek_offset, os.SEEK_END)
buf = in_file.read(block_size)
# Search for line end.
if ignore_ending_newline and seek_offset == -block_size and buf[-1] == '\n':
buf = buf[:-1]
pos = buf.rfind('\n')
if pos != -1:
# Found line end.
return buf[pos+1:] + suffix
suffix = buf + suffix
# One-line file.
return suffix
Note that this will not work on things that don't support seek
, like stdin or sockets. In those cases, you're stuck reading the whole thing (like the tail
command does).