Efficiently Read last 'n' rows of CSV into DataFrame
I don't think pandas offers a way to do this in read_csv
.
Perhaps the neatest (in one pass) is to use collections.deque
:
from collections import deque
from StringIO import StringIO
with open(fname, 'r') as f:
q = deque(f, 2) # replace 2 with n (lines read at the end)
In [12]: q
Out[12]: deque(['7,8,9\n', '10,11,12'], maxlen=2)
# these are the last two lines of my csv
In [13]: pd.read_csv(StringIO(''.join(q)), header=None)
Another option worth trying is to get the number of lines in a first pass and then read the file again, skip that number of rows (minus n) using read_csv
...
Here's a handy way to do. Works well for what I like to do -
import tailer
import pandas as pd
import io
with open(filename) as file:
last_lines = tailer.tail(file, 15)
df = pd.read_csv(io.StringIO('\n'.join(last_lines)), header=None)
You need to install tailer, to have this working:
pip install --user tailer