read a full excel file chunk by chunk using pandas
You could use range
. Assuming you want to process chunks of 100 lines in in 1000 lines excel file:
total = 1000
chunksize = 100
for skip in range(0, total, chunksize):
df = pd.read_excel('/path/excel.xlsx', skiprows=skip, nrows=chunksize)
# process df
...
The read_excel
does not have a chunk size argument. You can read the file first then split it manually:
df = pd.read_excel(file_name) # you have to read the whole file in total first
import numpy as np
chunksize = df.shape[0] // 1000 # set the number to whatever you want
for chunk in np.split(df, chunksize):
# process the data
Unfortunately for Excel, there is no escaping reading the whole file in memory so you will have to do it this way.
If you want to do this with skiprows
and skipfooter
, you need to know the size of the df (you can read it first).
df = pd.read_excel('/path/excel.xlsx')
df_size = df.shape[0]
columns = df.columns.values
chunksize = 1000
for i in range(0, df_size - chunksize, chunksiz):
df_chunk = pd.read_excel('/path/excel.xlsx', skiprows=i, skip_footer= (df_size - chunksize*(i+1)), names=columns)