Automate removing the first page of a bunch of PDFs

Solution 1:

I've done this using the Coherent PDF Command Line Tools Community Release.

You can download either the pre-built tools or the source code to compile the later on your own, however the latter requires OCaml be installed when compiling. So the pre-built tools are the easiest way to go. The downloaded distribution file, e.g. cpdf-binaries-master.zip, contains binaries for Linux, OS X/macOS and Windows and is ~5 MB in size.

Once downloaded and extracted (double-clicking the .zip file) you'd copy the, e.g. ~/Downloads/cpdf-binaries-master/OSX-Intel/cpdf, file to a location that is defined in the PATH environment variable, e.g. /usr/local/bin/ to make it globally available on the command line in Terminal. If it's not in the PATH then you'll have to use the fully qualified pathname to the cpdf executable or ./cpdf if it's in the present working directory (pwd). In Terminal, type echo $PATHso show the PATH.

The syntax for removing the first page when the PDF file has 2 or more pages is:

cpdf in.pdf 2-end -o out.pdf

Because cpdf reads the original file (in.pdf) and writes to a new file (out.pdf) the out.pdf filename needs to be different if saved to the same location as the in.pdf file, or save it to a different location with the same in.pdf filename as the out.pdf filename, or whatever out.pdf filename you want.

Below, I'll show two examples of automation using cpdf to remove the first page of a PDF file, assuming it has two or more pages. One using an Automator workflow as a Service available in Finder on the Services Context Menu and the other as a bash script, to use in Terminal.

As an Automator Service Workflow available in Finder on the Services Context Menu:

In Automator create a new service workflow using the settings as shown in the image below and copy and paste the code below the image into the Run Shell Script action and save as e.g.: Remove First Page from PDF

To use Remove First Page from PDF, in Finder select the PDF files you want to remove the first page from and then select Remove First Page from PDF from the Context Menu via right-click or control-click, or from Finder > Services > Remove First Page from PDF

Automator Service Workflow Image

for f in "$@"; do
            # Get Page Count.
        p="$(/usr/local/bin/cpdf -info "$f" | awk '/Pages:/{print $2}')"
            # Get file extension.
        ext="${f##*.}"
            # Get filename without extension.
        fn="${f%.*}"
            # Rename original file to "filename (original).pdf".
            # Use '-n' to not overwrite an existing file.
        mv -n "$f" "${fn} (original).${ext}"
        case "$p" in
            1)
                    # PDF file has 1 page. Make a copy to the
                    # original name for consistency of output.
                    # Use '-n' to not overwrite an existing file.
                cp -n "${fn} (original).${ext}" "$f"
                ;;
            *)
                    # PDF file has 2 or more pages.
                    # Remove the first page.
                /usr/local/bin/cpdf "${fn} (original).${ext}" 2-end -o "$f" 
                ;;      
        esac
            # If you don't want to keep the original
            #  file, remove '#' from the next line.
        # rm "${fn} (original).${ext}"
done

Note that the PATH passed a Run Shell Script action in Automator is, /usr/bin:/bin:/usr/sbin:/sbin. So the code above is using the fully qualified pathname to the cpdf executable, /usr/local/bin/cpdf as that is where I placed it so as to be available in Terminal by way of using its name cpdf, only.

Also note that if you do not want to keep the original files, then uncomment (remove the # from in front of) the # rm "${fn} (original).${ext}" command, just above the last line of code done.

As a bash script to use in Terminal:

Create the bash script in the following manner:

In Terminal:

touch rfpfpdf
open rfpfpdf

Copy the code block, starting with #!/bin/bash, below into the opened rfpfpdf document and then save it.

Back in Terminal:

Make the script executable:

chmod u+x rfpfpdf

Now move the rfpfpdf script to, e.g.: /usr/local/bin/

sudo mv rfpfpdf /usr/local/bin/

You can then change directory cd ... to a directory that has the PDF files that you want to remove the first page from and then simply type rfpfpdf and press enter.

The original files will be moved to "filename (original).pdf" and the newly create PDF file sans the first page, if 2 or more pages, will have the original filename.pdf name.

#!/bin/bash

for f in *.pdf *.PDF; do
    if [[ -f $f ]]; then
            # Get Page Count.
        p="$(cpdf -info "$f" | awk '/Pages:/{print $2}')"
            # Get file extension.
        ext="${f##*.}"
            # Get filename without extension.
        fn="${f%.*}"
            # Rename original file to "filename (original).pdf".
            # Use '-n' to not overwrite an existing file.
        mv -n "$f" "${fn} (original).${ext}"
        case "$p" in
            1)
                    # PDF file has 1 page. Make a copy to the
                    # original name for consistency of output.
                    # Use '-n' to not overwrite an existing file.
                cp -n "${fn} (original).${ext}" "$f"
                ;;
            *)
                    # PDF file has 2 or more pages.
                    # Remove the first page.
                cpdf "${fn} (original).${ext}" 2-end -o "$f" 
                ;;      
        esac
            # If you don't want to keep the original
            #  file, remove '#' from the next line.
        # rm "${fn} (original).${ext}"
    fi
done

Note that the code above assumes the cpdf executable is in a directory that is within the PATH environment variable, e.g.: /usr/local/bin/

Solution 2:

For a solution that will run on MacOS without additional third-party tools, the following python script will remove the front page from a PDF. It will process PDF filenames sent as arguments, or it can be used in an Automator Quick Action.

#!/usr/bin/python
# coding=utf-8

from Quartz import PDFDocument
import sys
from Foundation import NSURL

def removePage(filename):
    filename = filename.decode('utf-8')
    pdfURL = NSURL.fileURLWithPath_(filename)
    pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
    if pdfDoc:
        pageNum = pdfDoc.pageCount()
        if pageNum > 1:
            pdfDoc.removePageAtIndex_(0)
            pdfDoc.writeToFile_(filename)
    return

if __name__ == '__main__':
    for filename in sys.argv[1:]:
        removePage(filename)

Automate removing the first page of a bunch of PDFs

Solution 1:

Solution 2:

Related

Recent Posts