Need script to automatically export Skim PDFs with embedded notes

I have a folder with a few thousand PDF files, most of which have Skim annotations (e.g., highlights and notes). If I had fewer files, I'd just go through each one, export the PDF with embedded notes, and be on my merry way, having the ability to read my annotations with Adobe Reader on a PC (this is the goal).

But since there are so many files, I need a script that will automatically go through this folder, perform the "export as PDF with embedded notes" function for each PDF that has Skim annotations, and then give the new file the name of the original file plus "with skim notes" (or some other designation to indicate it's the version with embedded notes). Secondly, if I make further annotations to the original PDF, it'd be great if the script could update the exported file with the embedded notes.

I assume this is possible, as it appears Skim has great applescript support, but I have no idea how to create one.

Thank you so much for any help you can offer. WG

Solution 1:

I had the same problem and wanted to batch embed a lot of skim pdfs. The apple script by Jess Riedel wasn't working for me, so I coded a python script to do the job.

I used the script to embed 568 pdfs scattered in a folder with various subfolders. It took 300 seconds. Be aware, that the script won't process files with a " in its filepath.

It is possible to create an osx folder action that runs the script on changed files in a folder automatically, every time a file changes. I haven't developed one yet because I havent found any need for it.

You can download the script as an osx service and as an alfred workflow at https://github.com/alexandergogl/SkimPDF.

# -*- coding: utf-8 -*-

from os import system, walk, path


class SkimPDF(object):
    """docstring for SkimPDF."""
    def __init__(self):
        self.skimpdf_path = '/Applications/Skim.app/Contents/SharedSupport/skimpdf'
        self.embed_suffix = '_embeded'
        self.replace = False
        pass

    def embed_notes(self, in_pdf):
        """Embed skim notes to PDF."""
        if self.replace is False:
            out_pdf = "%s%s.pdf" % (in_pdf[:-4], self.embed_suffix)
        else:
            out_pdf = in_pdf

        # Embed notes
        cmd = '%s embed "%s" "%s"' % (self.skimpdf_path, in_pdf, out_pdf)
        result = system(cmd)

        # Compose message
        if result == 0:
            message = "Embeded notes to '%s'" % in_pdf
        else:
            message = result

        return message

    def embed_notes_batch(self, folder):
        """Loop through directories in given folder and embed notes."""
        messages = []
        i = 0
        for path, subdirs, files in walk(folder):
            for name in files:
                if name.endswith(".pdf"):
                    i += 1
                    # embed notes to pdf
                    pdf_file = "%s/%s" % (path, name)
                    result = skim.embed_notes(pdf_file)
                    # add result message to report
                    messages.append(result)
                    # report current state
                    print(i)

        self.report(messages)
        pass

    def report(self, messages):
        """Print list of processed pdfs."""
        print("\n\nProcessing PDFs done:")

        for i in range(len(messages)):
            message = "%s: %s" % (i + 1, messages[i])
            print(message)
        pass


skim = SkimPDF()
skim.replace = True  # set to false if you want a copy in place instead
# skim.embed_suffix = '_embeded'  # uncomment and enter your own suffix if necessary

# embed notes of a single pdf
skim.embed_notes('../path/to/pdf file.pdf')

# batch embeding process with path to folder with literature
skim.embed_notes_batch('../path/to/Literature folder')

Solution 2:

This is the method I used on my entire Zotero library (~3GB of PDFs). Note that the only annotations I use are highlighting (single color) and comments. Whether or not this successfully converts more complicated annotations depends on the detail of the skimembed script, which I don't know much about.

What worked

The original skimembed script converts a single PDF with skim annotations (which are in the form of "extended attributed") to a single PDF with embedded annotations. It is a shell script you run from the command line (terminal) using this notation

sh skimembed pdf_with_skim_annotations.pdf

or more generally

sh /path/to/scripts/folder/skimembed /path/to/pdf/folder/pdf_with_skim_annotations.pdf

Here, sh is the shell program you use to interpret the script skimembed. This script basically is just an automated way of using File > Export...PDF with embedded notes from the Skim menu. However, it does not make a second copy of the pdf; the new version replaces it and has the same name.

I then Googled around for a shell script that lets you apply skimembed recursively to all pdf files in a folder (including subfolders):

#!/bin/bash 
find $1 -type f -name "*.pdf" | while read f ; do
 sh /path/to/scripts/filder/skimembed "$f"
done

Here, $1 denotes the path to a folder, the first (and only) argument this script expects to receive. The find command returns all normal (-type f) files in that folder with the pdf ending (name "*.pdf"). The results are piped (|) to a while loop indexed by f.

I saved the above text in a file called recursiveskiembed.sh and then ran

sh recursiveskiembed.sh /path/to/pdfs/folder

In my case, the folder I choose was /Users/username/Library/Application Support/Zotero/Profiles/xxx123.default/zotero/storage. This takes every single PDF with Skim annotations in the folder and replaces them with normal, embedded PDF annotations.

What didn't work

I tried the user-submitted Skim scripts and in particular the skimalot script (which is a successor to skimembed) and the FilingEagle script and some other. But I couldn't get any of them to work. (Note that the sugarsync.com hosted files are dead links now.) Likewise, the half dozen AppleScript scripts always seem to fail with completely inscrutable errors.

The clobbergaurd script is supposed to check a large directory for filenames that differ only in ending to prevent you from overwriting stuff using skimalot, but I couldn't get it work. (The sugarsync link is dead, but a Google search turned up this Dropbox copy.) So I just backed up my zotero library and I crossed my fingers.

Solution 3:

I have uploaded a script, which contains a command-line utility downloaded from skim, and uses this code (note, it won't work without the utility!):

on open dropped_files
    set app_path to POSIX path of (path to me)
    repeat with current_file in dropped_files
        do shell script quoted form of (app_path & "Contents/Resources/skimpdf") & " embed " & quoted form of POSIX path of current_file & " " & quoted form of (((characters 1 through -5 of (POSIX path of current_file as string)) as string) & " with skim notes.pdf")
    end repeat
end open

You can download it from here. Drag all the PDFs (the files, not the folder) onto the droplet extracted from the zip file, and it should convert everything almost instantly!

Note: You'll have to right-click the app and open it once from the Finder, in order to bypass the "unknown developer" warning, and use the droplet. After that, you'll be all set!