Way to read first few lines for pandas dataframe
I think you can use the nrows
parameter. From the docs:
nrows : int, default None
Number of rows of file to read. Useful for reading pieces of large files
which seems to work. Using one of the standard large test files (988504479 bytes, 5344499 lines):
In [1]: import pandas as pd
In [2]: time z = pd.read_csv("P00000001-ALL.csv", nrows=20)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
In [3]: len(z)
Out[3]: 20
In [4]: time z = pd.read_csv("P00000001-ALL.csv")
CPU times: user 27.63 s, sys: 1.92 s, total: 29.55 s
Wall time: 30.23 s
I would use 'skiprows' argument in read_csv, e.g.,:
df = pd.read_csv(filename, skiprows=range(2, 20000), nrows=10000)