Efficiently Read last 'n' rows of CSV into DataFrame

I don't think pandas offers a way to do this in read_csv.

Perhaps the neatest (in one pass) is to use collections.deque:

from collections import deque
from StringIO import StringIO

with open(fname, 'r') as f:
    q = deque(f, 2)  # replace 2 with n (lines read at the end)

In [12]: q
Out[12]: deque(['7,8,9\n', '10,11,12'], maxlen=2)
         # these are the last two lines of my csv

In [13]: pd.read_csv(StringIO(''.join(q)), header=None)

Another option worth trying is to get the number of lines in a first pass and then read the file again, skip that number of rows (minus n) using read_csv...

Here's a handy way to do. Works well for what I like to do -

import tailer
import pandas as pd
import io

with open(filename) as file:
    last_lines = tailer.tail(file, 15)

df = pd.read_csv(io.StringIO('\n'.join(last_lines)), header=None)

You need to install tailer, to have this working:

pip install --user tailer

Efficiently Read last 'n' rows of CSV into DataFrame

Related

Recent Posts