Html inside XML. Should I use CDATA or encode the HTML [closed]

I am using XML to share HTML content. AFAIK, I could embed the HTML either by:

  • Encoding it: I don't know if it is completely safe to use. And I would have to decode it again.

  • Use CDATA sections: I could still have problems if the content contains the closing tag "]]>" and certain hexadecimal characters, I believe. On the other hand, the XML parser would extract the info transparently for me.

Which option should I choose?

UPDATE: The xml will be created in java and passed as a string to a .net web service, were it will be parsed back. Therefore I need to be able to export the xml as a string and load it using "doc.LoadXml(xmlString);"


Solution 1:

The two options are almost exactly the same. Here are your two choices:

<html>This is &lt;b&gt;bold&lt;/b&gt;</html>

<html><![CDATA[This is <b>bold</b>]]></html>

In both cases, you have to check your string for special characters to be escaped. Lots of people pretend that CDATA strings don't need any escaping, but as you point out, you have to make sure that "]]>" doesn't slip in unescaped.

In both cases, the XML processor will return your string to you decoded.

Solution 2:

CDATA is easier to read by eye while encoded content can have end of CDATA markers in it safely — but you don't have to care. Just use an XML library and stop worrying about it. Then all you have to say is "Put this text inside this element" and the library will either encode it or wrap it in CDATA markers.

Solution 3:

CDATA for simplicity.

Solution 4:

If you use CDATA, then you must decode it correctly (textContent, value and innerHTML are methods that will NOT return the proper data).

let us say that you use an xml structure similar to this:

<response>
    <command method="setcontent">
        <fieldname>flagOK</fieldname>
        <content>479</content>
    </command>
    <command method="setcontent">
        <fieldname>htmlOutput</fieldname>
        <content>
            <![CDATA[
            <tr><td>2013/12/05 02:00 - 2013/12/07 01:59 </td></tr><tr><td width="90">Rastreado</td><td width="60">Placa</td><td width="100">Data hora</td><td width="60" align="right">Km/h</td><td width="40">Direção</td><td width="40">Azimute</td><td>Mapa</td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 13:55</td><td align='right'>113</td><td align='right'>NE</td><td align='right'>40</td><td><a href="http://maps.google.com/maps?q=-22.6766,-50.2218&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.6766,-50.2218</a></td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 13:56</td><td align='right'>112</td><td align='right'>NE</td><td align='right'>23</td><td><a href="http://maps.google.com/maps?q=-22.6638,-50.2106&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.6638,-50.2106</a></td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 18:00</td><td align='right'>111</td><td align='right'>SE</td><td align='right'>118</td><td><a href="http://maps.google.com/maps?q=-22.7242,-50.2352&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.7242,-50.2352</a></td></tr>
            ]]>
        </content>
    </command>
</response>

in javascript, then you will decode by loading the xml (jquery, for example) into a variable like xmlDoc below and then getting the nodeValue for the 2nd occurence ( item(1) ) of the content tag

xmlDoc.getElementsByTagName("content").item(1).childNodes[0].nodeValue

or (both notations are equivalent)

xmlDoc.getElementsByTagName("content")[1].childNodes[0].nodeValue