Resolving Python-pptx package not found
I'm trying to do something fairly simple: store each text box in a powerpoint file as an element in a giant python list. This code should be getting me to that outcome:
text_array = []
for eachfile in glob.glob("master_folder\*.pptx"):
prs = Presentation(eachfile)
#print(eachfile)
#print("----------------------")
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
text_array.append(shape.text)
However, like some other questions on SO (PPTX Package not Found), I am greeted with the error:
PackageNotFoundError: Package not found at 'master_folder\April_2020.pptx'
What I've tried:
- double checking my versions/dependencies: all seems to be in order/compatible
- removing all spaces from the files and directories
However, the error has persisted.
Question
Can someone with experience using this library point me in the right direction for the simple task of scraping in-document text and storing it within an native python list (as seen in my code)?
Solution 1:
- file does not exist
- file is no valid pptx (unzip file and check folder and file structure)
- file is corrupt (opening in MS Office, change, save may fix)
- no access rights for python
- file is locked, e.g. opened by MS Office
if you can't find anything maybe provide a sample pptx which is not working.