Import multiple excel files into python pandas and concatenate them into one dataframe
As mentioned in the comments, one error you are making is that you are looping over an empty list.
Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.
(1) Imports:
import os
import pandas as pd
(2) List files:
path = os.getcwd()
files = os.listdir(path)
files
Output:
['.DS_Store',
'.ipynb_checkpoints',
'.localized',
'Screen Shot 2013-12-28 at 7.15.45 PM.png',
'test1 2.xls',
'test1 3.xls',
'test1 4.xls',
'test1 5.xls',
'test1.xls',
'Untitled0.ipynb',
'Werewolf Modelling',
'~$Random Numbers.xlsx']
(3) Pick out 'xls' files:
files_xls = [f for f in files if f[-3:] == 'xls']
files_xls
Output:
['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
(4) Initialize empty dataframe:
df = pd.DataFrame()
(5) Loop over list of files to append to empty dataframe:
for f in files_xls:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
(6) Enjoy your new dataframe. :-)
df
Output:
Result Sample
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
this works with python 2.x
be in the directory where the Excel files are
see http://pbpython.com/excel-file-combine.html
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()
This can be done in this way:
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("/path/to/directory/*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
all_data.to_csv("new_combined_file.csv")