xlrd.biffh.XLRDError: Excel xlsx file; not supported [duplicate]

I am trying to read a macro-enabled Excel worksheet using pandas.read_excel with the xlrd library. It's running fine in local, but when I try to push the same into PCF, I am getting this error:

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] df1=pd.read_excel(os.path.join(APP_PATH, os.path.join("Data", "aug_latest.xlsm")),sheet_name=None)

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] return open_workbook(filepath_or_buffer)
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] File "/home/vcap/deps/0/python/lib/python3.8/site-packages/xlrd/__init__.py", line 170, in open_workbook
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] xlrd.biffh.XLRDError: Excel xlsx file; not supported

How can I resolve this error?


Solution 1:

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange, but still present, in the readme on the repository and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

In your case, the solution is to:

  • make sure you are on a recent version of Pandas, at least 1.0.1, and preferably the latest release. 1.2 will make his even clearer.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your Pandas code to be:
    df1 = pd.read_excel(
         os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
         engine='openpyxl',
    )
    

Solution 2:

The previous version, xlrd 1.2.0, may appear to work, but it could also expose you to potential security vulnerabilities. With that warning out of the way, if you still want to give it a go, type the following command:

pip install xlrd==1.2.0