Pandas cannot open an Excel (.xlsx) file

Please see my code below:

import pandas
df = pandas.read_excel('cat.xlsx')

After running that, it gives me the following error:

Traceback (most recent call last):
  File "d:\OneDrive\桌面\practice.py", line 4, in <module>
    df = pandas.read_excel('cat.xlsx')
  File "D:\python\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\xlrd\__init__.py", line 170, in open_workbook
    raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported

I tried uninstall and reinstall Pandas with the pip command. The error persists. I have xlrd 2.0.1 and Pandas 1.1.5 installed.


Solution 1:

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange but still present in the readme on the repo and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

This is due to potential security vulnerabilities relating to the use of xlrd version 1.2 or earlier for reading .xlsx files.

In your case, the solution is to:

  • make sure you are on a recent version of pandas, at least 1.0.1, and preferably the latest release.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your pandas code to be:
    pandas.read_excel('cat.xlsx', engine='openpyxl')
    

Edit: Currently, pandas >= 1.2 addresses this issue. (Release Notes)

Solution 2:

The latest version of xlrd (2.0.1) only supports .xls files.

If you are prepared to risk potential security vulnerabilities, and risk incorrect parsing of certain files, this error can be solved by installing an older version of xlrd.

Use the command below in a shell or cmd prompt:

pip install xlrd==1.2.0