Using PDF MetaData to rename PDFs in MAC OS X

I have a number of PDFs that I downloaded from JStor. They come with unhelpful numeric filenames, but often there is at least a title if not the author in the metadata.

How do I batch rename a number of PDF's so that it will search the metadata and rename the file "Author-Title" when both "Author" and "Title" are specified in the metadata? If one is missing, I'd like it to just rename the file to reflect the metadata which is present, naming the file "Author" or "Title". If there is no metadata I would like nothing to change.


Free Bibliography software like

  • Jabref with a batch rename plugin,
  • Docear (a Mindmapping software which uses Jabref as bibliography tool)

can do this for you.

  • I assume that BibDesk (which is not platform independent, but much more mac-like) can also do that.

Here's one possible way, which would involve writing a script that shouldn't be too horrible.

Use JHOVE to extract metadata from the file and write it to an XML file. Use an XPath expression to pull out these paths:

jhove/repInfo/properties/property/property/property/values/value

where the three property elements in the path contain <name> elements whose value is respectively, "PDFMetadata", "Info", and "Title" to grab the title, and "PDFMetadata, "Info", and "Author" to grab the author. Then you can use these programmatically to create the new file name.

This is a rough outline, but I think the idea can work.

Full disclosure: I wrote most of the code for JHOVE.


On OS X, you can use mdls to query the Spotlight database for the properties it's extracted and indexed for the file:

$ mdls -name kMDItemTitle A-Self-Referential-Story.pdf 
kMDItemTitle = "This Is the Title of This Story, Which Is Also Found Several Times in the Story Itself"
$ mdls -name kMDItemAuthors A-Self-Referential-Story.pdf 
kMDItemAuthors = (
    "David Moser"
)

Zotero is very good as finding metadata of books and scientific papers and rename your files based on those metadata. Their database is also used on Wikipedia Citoid API.

You can drop a bunch of PDF in Zotero. Zotero will rename them (check the config to adapt naming pattern to your need). Then you can copy the renamed file outside Zotero if you do not wish to use Zotero.

Their is even a plugin for advance filename manipulation http://zotfile.com/. Zotfile also let you store PDFs outside the Zotero library in a custom folder.