Check if PDF files are corrupted using command line on Linux

I have many PDF files on one folder.

Is it possible check if one or more files are corrupted (zero pages, or unfinished downloads) using the command line, without needing to open them one by one?


You can try doing it with pdfinfo (here on Fedora in the poppler-utils package). pdfinfo gets information about the PDF file from its dictionary, so if it finds it the file should be ok

for f in *.pdf; do
  if ! pdfinfo "$f" &> /dev/null; then
    echo "$f" is broken
  fi
done

find . -iname '*.pdf' | while read -r f
  do
    if pdftotext "$f" - &> /dev/null; then 
        echo "$f" was ok;   
    else
        mv "$f" "$f.broken";
        echo "$f" is broken;   
    fi; 
done

My tool of choice for checking PDFs is qpdf. qpdf has a --check argument that does well to find problems in PDFs.

Check a single PDF with qpdf:

qpdf --check test_file.pdf

Check all PDFs in a directory with qpdf:

find ./directory_to_scan/ -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)

Command Explanation:

  • find ./directory_to_scan/ -type f -iname '*.pdf' Find all files with '.pdf' extension

  • -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; Execute qpdf for each file found and pipe all output to /dev/null. Also print filename followed by ': OK' if return status of qpdf is 0 (i.e. no errors)

  • -o -exec echo "{}": FAILED \; \) This gets executed if errors are found: Print filename followed by ": FAILED"


Where to get qpdf:

qpdf has both Linux and Windows binaries available at: https://github.com/qpdf/qpdf/releases. You could also use your package manager of choice to get it. For example on Ubuntu you can install qpdf using apt with the command:

apt install qpdf

I got myself an answer:

for x in *.pdf; do echo "$x"; pdfinfo "$x" | grep Pages; done

PDFs with errors will show errors.


All of the methods using pdfinfo or pdftotext have not worked for me. In fact they kept giving me false positives and sometimes created files I didn't need.

What did work was JHOVE.

Installation:

Install the jar from the above link and update your PATH environment variable with this command:

echo "export PATH=\$PATH:/REPLACE_WITH/YOUR/PATH_TO/jhove/" >> ~/.bash_profile

Refresh each terminal with source ~/.bash_profile and you're good to start using it system wide.

Basic Usage:

jhove -m pdf-hul someFile.pdf

You'll get a lot of info about the pdf - more than most people probably need.

Bash One-Liner:
Simply returns valid or invalid:

if [[ $(jhove -m pdf-hul someFile.pdf | grep -a "Status:") == *"Well-Formed and valid"* ]]; then echo "valid"; else echo "invalid"; fi;

Note that this was run on Mac OS X but I assume it works the same with any Unix based Bash environment.