transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8") is NOT working

Solution 1:

I had the same problem on Android when serializing emoji characters. When using UTF-8 encoding in the transformer the output was HTML character entities (UTF-16 surrogate pairs), which would subsequently break other parsers that read the data.

This is how I ended up solving it:

StringWriter sw = new StringWriter();
sw.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
Transformer t = TransformerFactory.newInstance().newTransformer();

// this will work because we are creating a Java string, not writing to an output
t.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(elementNode), new StreamResult(sw));

return IOUtils.toInputStream(sw.toString(), Charset.forName("UTF-8"));

Solution 2:

To answer the question following code works for me. This can take input encoding and convert the data into output encoding.

        ByteArrayInputStream inStreamXMLElement = new ByteArrayInputStream(strXMLElement.getBytes(input_encoding));
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder(); 
        Document docRepeat = db.parse(new InputSource(new InputStreamReader(inStreamXMLElement, input_encoding)));
        Node elementNode = docRepeat.getElementsByTagName(strRepeat).item(0);

        TransformerFactory tFactory = null;
        Transformer transformer = null;
        DOMSource domSourceRepeat = new DOMSource(elementNode);
        tFactory = TransformerFactory.newInstance();
        transformer = tFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.ENCODING, output_encoding);

        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        StreamResult sr = new StreamResult(new OutputStreamWriter(bos, output_encoding));


        transformer.transform(domSourceRepeat, sr);
        byte[] outputBytes = bos.toByteArray();
        strRepeatString = new String(outputBytes, output_encoding);

Solution 3:

I've spent significant amount of time debugging this issue because it was working well on my machine (Ubuntu 14 + Java 1.8.0_45) but wasn't working properly in production (Alpine Linux + Java 1.7).

Contrary to my expectation following from above mentioned answer didn't help.

ByteArrayOutputStream bos = new ByteArrayOutputStream();
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, "UTF-8"));

but this one worked as expected

val out = new StringWriter()
val result = new StreamResult(out)

Solution 4:

I could work around the problem by wrapping the Document object passed to the DOMSource constructor. The method getXmlEncoding of my wrapper always returns null, all other methods are delegated to the wrapped Document object.