How can I deskew and crop PDFs made from scanned pages *automatically*? [duplicate]

Have a look at deskew. It's a commandline tool. The download *zip seems to include binaries for Windows, MacOSX and Linux.

License is MPL (Mozilla) or LPGL (GNU), whatever you prefer.

The only drawback for you seems to be that it doesn't consume PDFs, only PNG and TIFF images (AFAICS). That means you'll have to set up a workflow of s.th. like:

 PDF.orig -> PNG.orig -> PNG.deskewed -> PDF.deskewed

I haven't tested it myself (yet), I just came across the website recently and bookmarked it.


Oh, let me add another answer. I just remembered netpbm. Haven't used it in years, but I think I should take a fresh look...

netpbm is a very powerful toolkit for the commandline to manipulate of graphic images. It ships nearly 300 separate tools. It includes converters for about 100 graphics formats.

And it also has a commandline tool that can rotate images:

pnmrotate

And it has another tool that tries to discover the angle of rotated images:

pamtilt

pamtilt returns a floating number of its guess of image rotation. So the automatic de-skewing of images should be within reach. A shell script could be written to do that. It would require different steps:

  1. Convert PDF page to an netpbm-suitable image format with the help of Ghostscript.
  2. Use pamtilt to auto-discover the skew angle of the image.
  3. Use pnmrotate to de-skew the image.
  4. Re-convert the image to PDF.

If you provide me access to a small sample of your PDF files I could try and come up with a shell script to accomplish the feat.


(I'm wondering heavily that [netpbm] doesn't seent to have a tag here on the superuser+stackoverflow.)