read a full excel file chunk by chunk using pandas

You could use range. Assuming you want to process chunks of 100 lines in in 1000 lines excel file:

total = 1000
chunksize = 100
for skip in range(0, total, chunksize):
    df = pd.read_excel('/path/excel.xlsx', skiprows=skip, nrows=chunksize)
    # process df
    ...

The read_excel does not have a chunk size argument. You can read the file first then split it manually:

df = pd.read_excel(file_name) # you have to read the whole file in total first
import numpy as np
chunksize = df.shape[0] // 1000 # set the number to whatever you want
for chunk in np.split(df, chunksize):
    # process the data

Unfortunately for Excel, there is no escaping reading the whole file in memory so you will have to do it this way.

If you want to do this with skiprows and skipfooter, you need to know the size of the df (you can read it first).

df = pd.read_excel('/path/excel.xlsx')
df_size = df.shape[0]
columns = df.columns.values
chunksize = 1000
for i in range(0, df_size - chunksize, chunksiz):
    df_chunk = pd.read_excel('/path/excel.xlsx', skiprows=i, skip_footer= (df_size - chunksize*(i+1)), names=columns)

read a full excel file chunk by chunk using pandas

Related

Recent Posts