Check if PDF files are corrupted using command line on Linux
I have many PDF files on one folder.
Is it possible check if one or more files are corrupted (zero pages, or unfinished downloads) using the command line, without needing to open them one by one?
You can try doing it with pdfinfo
(here on Fedora in the poppler-utils
package). pdfinfo
gets information about the PDF file from its dictionary, so if it finds it the file should be ok
for f in *.pdf; do
if ! pdfinfo "$f" &> /dev/null; then
echo "$f" is broken
fi
done
find . -iname '*.pdf' | while read -r f
do
if pdftotext "$f" - &> /dev/null; then
echo "$f" was ok;
else
mv "$f" "$f.broken";
echo "$f" is broken;
fi;
done
My tool of choice for checking PDFs is qpdf
. qpdf
has a --check
argument that does well to find problems in PDFs.
Check a single PDF with qpdf
:
qpdf --check test_file.pdf
Check all PDFs in a directory with qpdf
:
find ./directory_to_scan/ -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)
Command Explanation:
find ./directory_to_scan/ -type f -iname '*.pdf'
Find all files with '.pdf' extension-exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \;
Executeqpdf
for each file found and pipe all output to/dev/null
. Also print filename followed by ': OK' if return status ofqpdf
is 0 (i.e. no errors)-o -exec echo "{}": FAILED \; \)
This gets executed if errors are found: Print filename followed by ": FAILED"
Where to get qpdf
:
qpdf
has both Linux and Windows binaries available at: https://github.com/qpdf/qpdf/releases. You could also use your package manager of choice to get it. For example on Ubuntu you can install qpdf using apt with the command:
apt install qpdf
I got myself an answer:
for x in *.pdf; do echo "$x"; pdfinfo "$x" | grep Pages; done
PDFs with errors will show errors.
All of the methods using pdfinfo
or pdftotext
have not worked for me. In fact they kept giving me false positives and sometimes created files I didn't need.
What did work was JHOVE.
Installation:
Install the jar from the above link and update your PATH environment variable with this command:
echo "export PATH=\$PATH:/REPLACE_WITH/YOUR/PATH_TO/jhove/" >> ~/.bash_profile
Refresh each terminal with
source ~/.bash_profile
and you're good to start using it system wide.
Basic Usage:
jhove -m pdf-hul someFile.pdf
You'll get a lot of info about the pdf - more than most people probably need.
Bash One-Liner:
Simply returns valid
or invalid
:
if [[ $(jhove -m pdf-hul someFile.pdf | grep -a "Status:") == *"Well-Formed and valid"* ]]; then echo "valid"; else echo "invalid"; fi;
Note that this was run on Mac OS X but I assume it works the same with any Unix based Bash environment.