Convert PDF 2 sides per page to 1 side per page

How can I convert a PDF with 2 sides per page to 1 side per page?


OK, the problem was already solved with the help of Acrobat (full version, not Reader). But what to do if you don't have access to Acrobat? Could this be done with Ghostscript and pdftk as well?

How to solve this with the help of Ghostscript...

...and for the fun of it, let's not use an input file with "double-up" pages, but one with "treble-up". Actually, I received one such PDF today by email. It was a flyer, folded in Leporello's Scheme. The sheet size was A4 landscape (842pt x 595pt), and it was folded and layed-out like this:

Front side to be printed, page 1 of PDF
+--------+--------+--------+   ^
|        |        |        |   |
|   5    |   6    |   1    |   |
|        |        |        | 595 pt
|        |        |        |   |
|        |        |        |   |
|        |        |        |   |
+--------+--------+--------+   v
         ^        ^
        fold     fold
         v        v
+--------+--------+--------+   ^
|        |        |        |   |
|   2    |   3    |   4    |   |
|        |        |        | 595 pt
|        |        |        |   |
|        |        |        |   |
|        |        |        |   |
+--------+--------+--------+   v
Back side to be printed, page 2 of PDF
<---------- 842 pt -------->

I want to create 1 PDF with 6 pages, each of which has the unusual size of 280.67pt x 595 pt.

First Step

Let's first extract the left sections from each of the input pages:

gswin32c.exe ^
    -o left-sections.pdf ^
    -sDEVICE=pdfwrite ^
    -g2807x5950 ^
    -c "<</PageOffset [0 0]>> setpagedevice" ^
    -f myflyer.pdf

What did these parameters do?

  • -o ...............: Names output file. Implicitely also uses -dBATCH -dNOPAUSE -dSAFER.
  • -sDEVICE=pdfwrite : we want PDF as output format.
  • -g................: sets output media size in pixels. pdfwrite's default resolution is 720 dpi. Hence multiply by 10 to get a match for PageOffset.
  • -c "..............: asks Ghostscript to process the given PostScript code snippet just before the main input file (which needs to follow with -f).
  • <</PageOffset ....: sets shifting of page image on the medium. (Of course, for left pages the shift by [0 0] has no real effect.)
  • -f ...............: process this input file.

Which result did the last command achieve?

This one:

Output file: left-sections.pdf, page 1
+--------+  ^
|        |  |
|   5    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v

Output file: left-sections.pdf, page 2
+--------+  ^
|        |  |
|   2    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v
< 280 pt >

Second Step

Now let's do the analogous thing for the center sections:

gswin32c.exe ^
    -o center-sections.pdf ^
    -sDEVICE=pdfwrite ^
    -g2807x5950 ^
    -c "<</PageOffset [280.67 0]>> setpagedevice" ^
    -f myflyer.pdf

Result:

Output file: center-sections.pdf, page 1
+--------+  ^
|        |  |
|   6    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v

Output file: center-sections.pdf, page 2
+--------+  ^
|        |  |
|   3    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v
< 280 pt >

Third Step

Last, the right sections:

gswin32c.exe ^
    -o right-sections.pdf ^
    -sDEVICE=pdfwrite ^
    -g2807x5950 ^
    -c "<</PageOffset [561.34 0]>> setpagedevice" ^
    -f myflyer.pdf

Result:

Output file: right-sections.pdf, page 1
+--------+  ^
|        |  |
|   1    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v

Output file: right-sections.pdf, page 2
+--------+  ^
|        |  |
|   4    |  |
|        |595 pt
|        |  |
|        |  |
|        |  |
+--------+  v
< 280 pt >

Last Step

Now we combine the pages into one file:

pdftk.exe ^
  A=right-sections.pdf ^
  B=center-sections.pdf ^
  C=left-sections.pdf ^
  cat A1 B2 C2 A2 B1 C1 ^
  output single-files-input.pdf
  verbose

Done. Here is the desired result. 6 different pages, sized 280.67x595.

Result:

+--------+  +--------+  +--------+  +--------+  +--------+  +--------+   ^
|        |  |        |  |        |  |        |  |        |  |        |   |
|   1    |  |   2    |  |   3    |  |   4    |  |   5    |  |   6    |   |
|        |  |        |  |        |  |        |  |        |  |        | 595 pt
|        |  |        |  |        |  |        |  |        |  |        |   |
|        |  |        |  |        |  |        |  |        |  |        |   |
|        |  |        |  |        |  |        |  |        |  |        |   |
+--------+  +--------+  +--------+  +--------+  +--------+  +--------+   v
< 280 pt >  < 280 pt >  < 280 pt >  < 280 pt >  < 280 pt >  < 280 pt > 

Just had the same problem. I stumbled upon briss, an open source java GUI tool for separating and cropping pdf pages:

http://sourceforge.net/projects/briss/

It worked very well for me, on Linux, even though the user interface isn't completely trivial. It even worked with a pdf with some differently sized pages!


@peims, thanks. Here's the step-by-step version of your method. I tried it on a file I wanted to convert for my Kindle DX, and it works perfectly:

  • Use the full version of Acrobat v9 to crop the left hand side of the page, and save it as "left.pdf":
    • Use the crop tool to mark the left hand side of the page.
    • Right click, and select "Set Cropbox".
    • Select "Document..Crop Pages", and apply the crop to the whole document.
  • Repeat with the right hand pages, save as "right.pdf".
  • At this stage, you have two documents: "left.pdf" with the left hand pages, and "right.pdf" with the right hand pages.

Next, use pdftk.exe (from http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) to interleave the results into a single file. Copy "pdftk.exe", "left.pdf" and "right.pdf" to to "D:\", and execute:

  • D:>pdftk D:\left.pdf burst output %05d_A.pdf
  • D:>pdftk D:\right.pdf burst output %05d_B.pdf
  • D:>pdftk *_?.pdf cat output combined.pdf

Note: if you copy the files to "C:\", it won't work under Win 7 due to security permissions. If you don't have a D:\, then create a directory "C:\x" to complete the operation.

These results would normally be good enough. However, there's two more optional steps to improve the output.

  • (optional final step 1) At this stage, the document is huge (my doc ballooned from 7MB to 80MB), so you can reduce the file size by using either:
    • "Advanced..PDF Optimizer", or:
    • "Advanced..Preflight" with the "Compatible with Acrobat 5" setting.
  • (optional final step 2) The pages are all different sizes. Repeat the crop on all pages, so everything is a uniform size.

You could duplicate the document, then crop the pages so that only the even page numbers show in one file and only the odd page numbers in the other. Then split the files into single pages and recombine to make one document with single sides to a page...

You can do this using a number of methods, for example:

  1. Use the Adobe Acrobat crop tool to crop out one side of the double page, and apply the crop to all pages.
  2. Split the files to individual pages using the the 'burst' command in pdftoolkit
  3. Rename the files sequential using a file renaming tool (e.g. ReNamer)
  4. Recombine the pages using the 'cat' command in pdftoolkit

I use the following script to process scanned books on Mac and Linux. This can get quite memory intensive.

#!/bin/bash
#
# This script processes scanned books. After scanning the books have been cropped with
# Preview. This does kind of a "soft crop" that we need to make a bit "harder". 
#
# The second step is to take each page of the PDF and split this into two two pages, 
# because each page of the scanned document actually contains two pages of the book.
#
#

FILE=`mktemp`.pdf
FILE2=`mktemp`.pdf
FILE3=`mktemp`.pdf

echo "Making a temporary copy of the input file."

cp $1 $FILE

#
# Start cropping
#

echo "Cropping the PDF"

# The first regex removes all boxes but CropBox. The second regex renames the  CropBox as MediaBox

perl -pi.bak -e 's/\/(Media|Bleed|Art|Trim)Box[\n\l\f\s]*\[(.+?)\]//msg;' $FILE
perl -pi.bak -e 's/CropBox/MediaBox/g;' $FILE

echo "Validating the PDF"

#Run PDFTK to ensure that the file is OK

cat $FILE | pdftk - output $FILE2

#
# Done cropping, start splitting the pages
#

echo "Splitting the pages in two and changing to 200 dpi with imagemagick. Output goes to $FILE3"

convert -density 200  $FILE2 -crop 50%x0 +repage $FILE3

#
# Done spliting, copy the result in a new file
#


mv $FILE3 $1.pages.pdf