Command Line Tool to Batch Convert .EML/.EMLX/.MBOX to Searchable PDFs?

Solution 1:

I had to do this with ~180 emails, and I used a command tool I found on GitHub that converts .eml to .pdf via .html: https://github.com/nickrussler/eml-to-pdf-converter

It takes a little while to convert each .eml file - 22 minutes for 186 emails with lots of images - so it's probably not helpful for a 500k email task. (Maybe if you're reeeally not in a rush and not afraid of multiprocessing!) If it is helpful for you or anyone else, though, here's how I got it to work in the bash command line:

  1. git clone the repo

  2. Install the wkhtmltopdf tool from binary (installing with pip is insufficient) from here: https://wkhtmltopdf.org/downloads.html

  3. From within the cloned repo, generate the email converter .jar file: ./gradlew shadowJar

  4. Run for loop to convert every file in the .mbox (or a directory of .eml):

for file in /path/to/mailbox.mbox/*; 
do
   java -jar ./build/libs/emailconverter-2.0.1-all.jar "$file"; 
done

Solution 2:

I recently came across How to open eml files? on AskUbuntu. It suggests using munpack, which is part of mpack. It can convert an eml to html or plain txt. There are several tools to convert html to a pdf. WeasyPrint is one of them. You can install it via pip. mpack is also available in Homebrew. Assuming you have Homebrew installed, it's easily installed via:

brew install mpack

Then run

munpack -t <my.eml>