How to concatenate three excels files xlsx using python?

Solution 1:

Here's a pandas-based approach. (It's using openpyxl behind the scenes.)

import pandas as pd

# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]

# read them in
excels = [pd.ExcelFile(name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]

# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("c.xlsx", header=False, index=False)

Solution 2:

I'd use xlrd and xlwt. Assuming you literally just need to append these files (rather than doing any real work on them), I'd do something like: Open up a file to write to with xlwt, and then for each of your other three files, loop over the data and add each row to the output file. To get you started:

import xlwt
import xlrd

wkbk = xlwt.Workbook()
outsheet = wkbk.add_sheet('Sheet1')

xlsfiles = [r'C:\foo.xlsx', r'C:\bar.xlsx', r'C:\baz.xlsx']

outrow_idx = 0
for f in xlsfiles:
    # This is all untested; essentially just pseudocode for concept!
    insheet = xlrd.open_workbook(f).sheets()[0]
    for row_idx in xrange(insheet.nrows):
        for col_idx in xrange(insheet.ncols):
            outsheet.write(outrow_idx, col_idx, 
                           insheet.cell_value(row_idx, col_idx))
        outrow_idx += 1
wkbk.save(r'C:\combined.xls')

If your files all have a header line, you probably don't want to repeat that, so you could modify the code above to look more like this:

firstfile = True # Is this the first sheet?
for f in xlsfiles:
    insheet = xlrd.open_workbook(f).sheets()[0]
    for row_idx in xrange(0 if firstfile else 1, insheet.nrows):
        pass # processing; etc
    firstfile = False # We're done with the first sheet.

Solution 3:

When I combine excel files (mydata1.xlsx, mydata2.xlsx, mydata3.xlsx) for data analysis, here is what I do:

import pandas as pd
import numpy as np
import glob

all_data = pd.DataFrame()
for f in glob.glob('myfolder/mydata*.xlsx'):
   df = pd.read_excel(f)
   all_data = all_data.append(df, ignore_index=True)

Then, when I want to save it as one file:

writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()