Use OpenOffice from command line to convert HTML to RTF

I'm trying to build a bash script in Cygwin that will convert HTML files to RTF. In OS X this is trivial with textutils, but that doesn't exist for regular Linux or Cygwin. Instead I'm trying to use OpenOffice from the command line.

I've read elsewhere that OpenOffice can run headlessly with a program normally installed as /usr/bin/ooffice, but in Cygwin under Windows this obviously doesn't work—the OpenOffice installer doesn't built native Cygwin symlinks and might not even install the Windows equivalent of ooffice.

How can I use OpenOffice from the command line in Cygwin to convert HTML files to RTF files?


Solution 1:

There is a really handy shell script called unoconv that handles conversion of any files from and to any file format that OpenOffice/LibreOffice supports. You can read up about it on its site and be sure to check out the man page. Many distros have packages for it that you can install easily, including, I believe, cygwin.

Once you have it installed, usage in your case would mean specifying an input html file and an output rtf file like this:

unoconv file.html file.rtf

All done :)

Of course this could be scripted to handle multiple file situations as well. If you are using zsh, you could run something like this to convert a whole folder of html files:

for file in *html; do
    unoconv "$file" "${file/html/rtf}"
done

Solution 2:

I would suggest the JODConverter. It is a java wrapper around the OpenDoc Api for conversion. Allows you to convert files like this:

java -jar jodconverter-cli-2.2.0.jar foo.html foo.rtf

It's also available in python.

instead of using the openoffice SDK DocumentSaver class like this:

java -classpath .;./bin;\
                $OO/program/classes/jurt.jar;\
                $OO/program/classes/ridl.jar;\
                $OO/program/classes/sandbox.jar;\
                $OO/program/classes/unoil.jar;\
                $OO/program/classes/juh.jar  \
    DocumentSaver uno:socket,host=localhost,port=8100;urp;StarOffice.ServiceManager  file:///C:/test/foo.html file:///C:/test/foo.rtf

Solution 3:

I can help with the first part of your question. Here's an example of running OpenOffice from the Cygwin command line:

/cygdrive/c/Program\ Files/OpenOffice.org\ 3/program/soffice.exe -help

That will give you a list of command line arguments. I didn't see any that would convert file types or even "Save As", but I didn't research the API. Perhaps you can fill in that part. I have OpenOffice.org 3.2 320m12(Build:9483).