Automate removing the first page of a bunch of PDFs
Solution 1:
I've done this using the Coherent PDF Command Line Tools Community Release.
You can download either the pre-built tools or the source code to compile the later on your own, however the latter requires OCaml be installed when compiling. So the pre-built tools are the easiest way to go. The downloaded distribution file, e.g. cpdf-binaries-master.zip, contains binaries for Linux, OS X/macOS and Windows and is ~5 MB in size.
Once downloaded and extracted (double-clicking the .zip file) you'd copy the, e.g. ~/Downloads/cpdf-binaries-master/OSX-Intel/cpdf, file to a location that is defined in the PATH
environment variable, e.g. /usr/local/bin/
to make it globally available on the command line in Terminal. If it's not in the PATH
then you'll have to use the fully qualified pathname to the cpdf
executable or ./cpdf
if it's in the present working directory (pwd
). In Terminal, type echo $PATH
so show the PATH
.
The syntax for removing the first page when the PDF file has 2 or more pages is:
cpdf in.pdf 2-end -o out.pdf
Because cpdf
reads the original file (in.pdf) and writes to a new file (out.pdf) the out.pdf filename needs to be different if saved to the same location as the in.pdf file, or save it to a different location with the same in.pdf filename as the out.pdf filename, or whatever out.pdf filename you want.
Below, I'll show two examples of automation using cpdf
to remove the first page of a PDF file, assuming it has two or more pages. One using an Automator workflow as a Service available in Finder on the Services Context Menu and the other as a bash
script, to use in Terminal.
As an Automator Service Workflow available in Finder on the Services Context Menu:
In Automator create a new service workflow using the settings as shown in the image below and copy and paste the code below the image into the Run Shell Script action and save as e.g.: Remove First Page from PDF
To use Remove First Page from PDF, in Finder select the PDF files you want to remove the first page from and then select Remove First Page from PDF from the Context Menu via right-click or control-click, or from Finder > Services > Remove First Page from PDF
for f in "$@"; do
# Get Page Count.
p="$(/usr/local/bin/cpdf -info "$f" | awk '/Pages:/{print $2}')"
# Get file extension.
ext="${f##*.}"
# Get filename without extension.
fn="${f%.*}"
# Rename original file to "filename (original).pdf".
# Use '-n' to not overwrite an existing file.
mv -n "$f" "${fn} (original).${ext}"
case "$p" in
1)
# PDF file has 1 page. Make a copy to the
# original name for consistency of output.
# Use '-n' to not overwrite an existing file.
cp -n "${fn} (original).${ext}" "$f"
;;
*)
# PDF file has 2 or more pages.
# Remove the first page.
/usr/local/bin/cpdf "${fn} (original).${ext}" 2-end -o "$f"
;;
esac
# If you don't want to keep the original
# file, remove '#' from the next line.
# rm "${fn} (original).${ext}"
done
Note that the PATH
passed a Run Shell Script action in Automator is, /usr/bin:/bin:/usr/sbin:/sbin
. So the code above is using the fully qualified pathname to the cpdf
executable, /usr/local/bin/cpdf
as that is where I placed it so as to be available in Terminal by way of using its name cpdf
, only.
Also note that if you do not want to keep the original files, then uncomment (remove the #
from in front of) the # rm "${fn} (original).${ext}"
command, just above the last line of code done
.
As a bash
script to use in Terminal:
Create the bash
script in the following manner:
In Terminal:
touch rfpfpdf
open rfpfpdf
Copy the code block, starting with #!/bin/bash
, below into the opened rfpfpdf
document and then save it.
Back in Terminal:
Make the script executable:
chmod u+x rfpfpdf
Now move the rfpfpdf
script to, e.g.: /usr/local/bin/
sudo mv rfpfpdf /usr/local/bin/
You can then change directory cd ...
to a directory that has the PDF files that you want to remove the first page from and then simply type rfpfpdf
and press enter.
The original files will be moved to "filename (original).pdf" and the newly create PDF file sans the first page, if 2 or more pages, will have the original filename.pdf
name.
#!/bin/bash
for f in *.pdf *.PDF; do
if [[ -f $f ]]; then
# Get Page Count.
p="$(cpdf -info "$f" | awk '/Pages:/{print $2}')"
# Get file extension.
ext="${f##*.}"
# Get filename without extension.
fn="${f%.*}"
# Rename original file to "filename (original).pdf".
# Use '-n' to not overwrite an existing file.
mv -n "$f" "${fn} (original).${ext}"
case "$p" in
1)
# PDF file has 1 page. Make a copy to the
# original name for consistency of output.
# Use '-n' to not overwrite an existing file.
cp -n "${fn} (original).${ext}" "$f"
;;
*)
# PDF file has 2 or more pages.
# Remove the first page.
cpdf "${fn} (original).${ext}" 2-end -o "$f"
;;
esac
# If you don't want to keep the original
# file, remove '#' from the next line.
# rm "${fn} (original).${ext}"
fi
done
Note that the code above assumes the cpdf
executable is in a directory that is within the PATH
environment variable, e.g.: /usr/local/bin/
Also note that if you do not want to keep the original files, then uncomment (remove the #
from in front of) the # rm "${fn} (original).${ext}"
command, just above the last line of code done
.
Solution 2:
For a solution that will run on MacOS without additional third-party tools, the following python script will remove the front page from a PDF. It will process PDF filenames sent as arguments, or it can be used in an Automator Quick Action.
#!/usr/bin/python
# coding=utf-8
from Quartz import PDFDocument
import sys
from Foundation import NSURL
def removePage(filename):
filename = filename.decode('utf-8')
pdfURL = NSURL.fileURLWithPath_(filename)
pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
if pdfDoc:
pageNum = pdfDoc.pageCount()
if pageNum > 1:
pdfDoc.removePageAtIndex_(0)
pdfDoc.writeToFile_(filename)
return
if __name__ == '__main__':
for filename in sys.argv[1:]:
removePage(filename)