Handle Doc/Docx Templates on a headless server to produce PDFs preferably without using OpenOffice.org
On a production web server I have to produce letters based on a template I got in MS-Word binary format. I use PHP and for the search and replace task I found PHPWord, which can handle Docx files, so I converted the template to OpenXML on my local workstation. Unfortunately the output also is Docx.
The goal is to produce a single PDF for the user to download so she can print out a bunch of letters at once very easily.
Now I need to find a way to either:
- Search and replace text in a PDF file
- Convert Docx to PDF without loss of formatting
- Edit the original Doc template without loss of formatting and without using COM
- Convert Docx to Doc without loss of formatting (which seems nearly impossible for the template looks good in word but technically how the formatting is done is a big pile of...) so I could convert it using wvPDF
What I don't want to use besides OpenOffice.org are web services. I'm aware of PHPLiveDocx but I don't want to depend on an external service for performance, availability, security reasons. Also buying a piece of software isn't an option in this case (can't influence that).
Running on a public facing web server I don't want to pull OpenOffice.org - not even headless, as it will pull around 160MB of compressed(!) binaries and best practice is not no load binaries you don't really need on a public facing server. Though it's a last resort to use oo.o I want to make sure I have ruled out any other options there may have been.
The host OS is CentOS 5.5.
Where can I go from here?
Regards, luxifer
To my knowledge there is no application that can do this without some dependency from Libre Office.
However you don't need to install the whole office suite when only performing commandline conversions.
You can try if the tool unoconv meets your needs. It has python and python-uno as a dependency. The latter will also install libreoffice-core as a dependency but not the whole office suite.
AbiWord will convert between any formats it knows from the command line, which includes all those you mention. E.g,. to convert odt to pdf:
abiword --to=pdf filename.odt
to convert .docx to .doc:
abiword --to=doc filename.docx
(If you want to search it it, just convert to something plain-text based like HTML or RTF or even TXT and search in there; convert back if need be.)
But what exactly are the obvious reasons not to install OpenOffice so you can use its libraries with, e.g., unoconv?
You could try AbiWord server side example given in this link http://www.advogato.org/person/msevior/diary.html?start=65