How can I fix/repair a corrupted PDF file?

Ghostscript will repair your corrupted PDF automatically... if it can open it in the first place (that is, if it is not damaged beyond repair). But afterwards you'll still need to double-check the result...

On Linux, try this command:

 gs \
  -o repaired.pdf \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
   corrupted.pdf

On Windows, try this one:

 gswin32c.exe ^
  -o repaired.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/prepress ^
   corrupted.pdf

I had a corrupted PDF file, print.pdf , that Ghostscript couldn't open, but the usual graphical Linux PDF viewers (Okular, Evince) opened fine. (In my case, the file had garbage at the start instead of a PDF header, when opened in a hex editor.)

These PDF viewers use Poppler as a back-end PDF renderer. So you can repair the PDF using Poppler's command-line tools. In Ubuntu these are in the poppler-utils package. I used:

pdftocairo -pdf print.pdf print_repaired.pdf

which generated a PDF file with correct headers, which tools like Ghostscript now accepted.

mutool (project page, manpage) will repair broken PDFs without printing them.

Installation e.g. on Ubuntu: sudo apt-get install mupdf-tools
Run it like this: mutool clean input.pdf output.pdf

mutool clean [options] input.pdf [output.pdf] [pages]

  The clean command pretty prints and rewrites the syntax of a PDF file.
   It can be used to repair broken files, expand compressed streams,
   filter out a range of pages, etc.
  If no output file is specified, it will write the cleaned PDF to
   "out.pdf" in the current directory.

Alternatively, there are a few tools and frameworks that can decompose/decompile PDFs into their components without rendering them. These could be useful for extracting text, scripts, and images. See this answer for a list of such tools: https://reverseengineering.stackexchange.com/q/1526/8210. E.g. you can try the current top answer Origami, it has a GTK-based viewer.

I had a corrupted pdf file, because the php file used to download it echoed some errors (in HTML) and NUL characters at the end.

The solution was to open the pdf with Notepad++ and remove all text after the line

%%EOF

How can I fix/repair a corrupted PDF file?

Related

Recent Posts