reading excel to a python data frame starting from row 5 and including headers
how do I import excel data into a dataframe in python.
Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.
Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.
import xlrd
wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')
#sheetnames
print wb.sheet_names()
#number of sheets
print wb.nsheets
The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.
Solution 1:
You can use pandas' ExcelFile parse
method to read Excel sheets, see io docs:
xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')
df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])
skiprows
will ignore the first 4 rows (i.e. start at row index 4), and several other options.
Solution 2:
The accepted answer is old (as discussed in comments of the accepted answer). Now the preferred option is using pd.read_excel(). For example:
df = pandas.read_excel('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm'), skiprows=[0,1,2,3,4])