PDF file renaming according to metadata?
Is there something I can use that renames PDF files according to their content? Basically an equivalent of http://macscripter.net/viewtopic.php?id=27620 in Ubuntu.
This is very easy to achieve with exiftool.
For instance, the following command would rename all files in the current directory to <title>.extension
:
exiftool '-filename<$title.%e' .
You can install exiftool on Ubuntu with:
sudo apt-get install libimage-exiftool-perl
Please consult the official documentation for more information:
http://www.sno.phy.queensu.ca/~phil/exiftool/filename.html
If you are comfortable with python you could use the script on http://blog.matt-swain.com/post/25650072381/a-lightweight-xmp-parser-for-extracting-pdf-metadata-in. I have just tested the scripts he provides (for a start, you can pip install pdfminer
) and they work nicely. The result they give is something along the lines of:
[{'ModDate': "D:20050422142709+02'00'", 'CreationDate': "D:20050422142709+02'00'", 'Producer': 'Mac OS X 10.3.8 Quartz PDFContext', 'Creator': 'Word'}]
That output you could use to rename your files.
There is another alternative. You could sudo apt-get install pdftk
. With that library you can run a command like pdftk myfile.pdf dump_data
which results in something in a set of info
and value
:
InfoKey: Creator
InfoValue: Word
InfoKey: Producer
InfoValue: Mac OS X 10.3.8 Quartz PDFContext
InfoKey: ModDate
InfoValue: D:20050422142709+02'00'
InfoKey: CreationDate
InfoValue: D:20050422142709+02'00'
PdfID0: d7af25c8df737276d8d6b5de49d94d92
PdfID1: d7af25c8df737276d8d6b5de49d94d92
NumberOfPages: 58
Again you could use that information in a renaming script. If feel the latter is something best customized because it depends on whether you just want the title, title-author, or something else.
Source