Read XLSB File in Pandas Python
There are many questions on this, but there has been no simple answer on how to read an xlsb file into pandas. Is there an easy way to do this?
With the 1.0.0
release of pandas - January 29, 2020
, support for binary Excel files was added.
import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
Notes:
- You will need to upgrade pandas -
pip install pandas --upgrade
- You will need to install
pyxlsb
-pip install pyxlsb
Hi actually there is a way. Just use pyxlsb library.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row])
df = pd.DataFrame(df[1:], columns=df[0])
UPDATE: as of pandas version 1.0 read_excel() now can read binary Excel (.xlsb) files by passing engine='pyxlsb'
Source: https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html
Pyxlsb indeed is an option to read xlsb file, however, is rather limited.
I suggest using the xlwings package which makes it possible to read and write xlsb files without losing sheet formating, formulas, etc. in the xlsb file. There is extensive documentation available.
import pandas as pd
import xlwings as xw
app = xw.App()
book = xw.Book('file.xlsb')
sheet = book.sheets('sheet_name')
df = sheet.range('A1').options(pd.DataFrame, expand='table').value
book.close()
app.kill()
'A1' in this case is the starting position of the excel table. To write to xlsb file, simply write:
sheet.range('A1').value = df