How can I determine the page count of odt, doc, docx and other office documents from the CLI?
It's easy to find the page count of a PDF document from the command line:
pdfinfo sample.pdf | grep ^Pages:
... but I haven't been able to find a similar method for odt
files and other office documents.
Is there a way to programmatically determine the page count of these documents?
Thanks for all the answers, everyone. With your help I was able to compile a list of commands that can extract the page count from almost all relevant office documents:
DOCX/PPTX
unzip -p 'sample.docx' docProps/app.xml | grep -oP '(?<=\<Pages\>).*(?=\</Pages\>)'
unzip -p 'sample.pptx' docProps/app.xml | grep -oP '(?<=\<Slides\>).*(?=\</Slides\>)'
Note: unzip
can be installed with sudo apt-get install unzip
.
DOC/PPT
wvSummary sample.doc | grep -oP '(?<=of Pages = )[ A-Za-z0-9]*'
wvSummary sample.ppt | grep -oP '(?<=of Slides = )[ A-Za-z0-9]*'
Note: wvSummary
(case-sensitive!) is part of the wv
package. Install it with sudo apt-get install wv
.
ODT
unzip -p sample.odt meta.xml | grep -oP '(?<=page-count=")[ A-Za-z0-9]*'
pdfinfo sample.pdf | grep -oP '(?<=Pages: )[ A-Za-z0-9]*'
Note: pdfinfo
is part of poppler-utils
and should come preinstalled on Ubuntu.
DJVU
djvused -e "n" sample.djvu
Note: djvused
is part of the djvulibre-bin
package and may be installed with sudo apt-get install djvulibre-bin
.
I didn't find a way to extract odt
file info as pdfinfo
does, but you can create a fast script to use pdfinfo
with the odt
files, converting each odt file to PDF and later deleting the converted file if you are not going to use it:
libreoffice --headless --invisible --convert-to pdf sample.odt
pdfinfo sample.pdf | grep ^Pages:
rm sample.pdf
Hope that this helped you.