How to extract images from Word Document from Linux

Since docx files are zip files you can unzip the docx file and then pick out the image files.

I have no Microsoft Office to test so I downloaded some random docx files from the internet. It seems that the images are always stored in a word/media directory in the archive.

This command will extract all files from the media directory from the archive:

unzip foo.docx "word/media/*"

This command will extract only *.jpeg files:

unzip foo.docx "*.jpeg"

Note that you have to specify "*.jpg" if the files are saved as jpg instead of jpeg. I assume that it is also possible that images are stored using a different format. I have no idea whether images can be stored in another location other than the word/media directory. You can use unzip -l to list the contents of the archive.

I wrote an open source Python program called ofc_media that basically does the unzipping mentioned in lesmana's answer, but automates the search process a bit. It also works on OpenDocument format documents, can limit the extraction to certain file extensions, etc.

How to extract images from Word Document from Linux

Related

Recent Posts