Command line tool to convert DOC and DOCX files to PDF
Is there any command line tool to convert DOC and DOCX files to PDF? If no, can it be automated by some Automator script (open-print to PDF-close)?
Solution 1:
If you have Office:Mac 2008 Business Edition or Office:Mac 2011 Home/Business Edition, Automator actions are included with those editions. One of the Automator actions included with those versions of Office:Mac is "Convert Format of Word Documents", and one of the options in that Automator action is PDF. This page has great information about Automator and Office:Mac.
If you have Home/Student Edition instead of Business Edition, or don't have Office at all, you can accomplish it via AppleScript. Mac OS X Hints has an article about bulk converting text files to PDF via AppleScript, and the comments to that article give some options to convert DOC/DOCX to PDF via RTF. That might result in a loss of formatting or linking if you've got very complex DOC/DOCX files, but might be sufficient for files that aren't terribly complex.
Solution 2:
You can use the docx2pdf
command line tool to bulk convert docx to pdf. It uses Microsoft Word to directly convert to pdf so you will need to have it installed. One macOS, it uses JXA (AppleScript for JavaScript) to talk to Word and on windows it uses win32com.
pip install docx2pdf
# single file
docx2pdf myfile.docx
# entire folder
docx2pdf myfolder/
Disclaimer: I wrote docx2pdf
after getting frustrated at the lack of cross-platform tools to convert docx to pdf directly using Microsoft Word as I needed a perfect replica with zero formatting issues. https://github.com/AlJohri/docx2pdf
Solution 3:
In Office 2016 the Automator approach will run into issues due to security sandboxing. (The symptom: Word stays open, and you get an "Error while printing" dialog.)
A workaround is to install LibreOffice, which can be used to convert files from the command line. On MacOS, the command is:
/Applications/LibreOffice.app/Contents/MacOS/soffice \
--headless \
--convert-to pdf \
myfile.docx
The PDF will only be as good as LibreOffice's conversion from MS Office, of course, but it's adequate for many purposes.
Another approach, if you really don't care about formatting, is to use pandoc
and LaTeX:
pandoc -t latex myfile.docx -o myfile.pdf
You'll need to install pandoc and LaTeX as described in this answer, though, and your PDF will come out looking like a LaTex document -- basic formatting, headers, lists, etc. will generally be preserved, but things like fonts and margins won't.