RTF to Plain Text in Java

Solution 1:

I use Swing's RTFEditorKit in Java 6 like this:

RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());

and thats working.

Solution 2:

Try Apache Tika: http://tika.apache.org/0.9/formats.html#Rich_Text_Format

Solution 3:

You might consider RTF Parser Kit as a lightweight alternative to the Swing RTFEditorKit. The line below shows plain text extraction from an RTF file. The RTF file is read from the input stream, the extracted text is written to the output stream.

new StreamTextConverter().convert(new RtfStreamSource(inputStream), outputStream, "UTF-8");

(full disclosure: I'm the author of RTF Parser Kit)