What does <![CDATA[]]> in XML mean?

I often find this strange CDATA tag in XML files:

<![CDATA[some stuff]]>

I have observed that this CDATA tag always comes at the beginning, and then followed by some stuff.

But sometimes it is used, sometimes it is not. I assume it is to mark that some stuff is the "data" that will be inserted after that. But what kind of data is some stuff? Isn't anything I write in XML tags some sort of data?

Solution 1:

CDATA stands for Character Data and it means that the data in between these strings includes data that could be interpreted as XML markup, but should not be.

The key differences between CDATA and comments are:

As Richard points out, CDATA is still part of the document, while a comment is not.
In CDATA you cannot include the string ]]> (CDEnd), while in a comment -- is invalid.
Parameter Entity references are not recognized inside of comments.

This means given these four snippets of XML from one well-formed document:

<!ENTITY MyParamEntity "Has been expanded">

<!--
Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.
-->

<![CDATA[
Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence. If I need to use CEND I must escape one of the
brackets or the greater-than sign using concatenated CDATA sections.
]]>

<description>An example of escaped CENDs</description>
<!-- This text contains a CEND ]]> -->
<!-- In this first case we put the ]] at the end of the first CDATA block
     and the > in the second CDATA block -->
<data><![CDATA[This text contains a CEND ]]]]><![CDATA[>]]></data>
<!-- In this second case we put a ] at the end of the first CDATA block
     and the ]> in the second CDATA block -->
<alternative><![CDATA[This text contains a CEND ]]]><![CDATA[]>]]></alternative>

Solution 2:

A CDATA section is "a section of element content that is marked for the parser to interpret as only character data, not markup."

Syntactically, it behaves similarly to a comment:

<exampleOfAComment>
<!--
    Since this is a comment
    I can use all sorts of reserved characters
    like > < " and &
    or write things like
    <foo></bar>
    but my document is still well-formed!
-->
</exampleOfAComment>

... but it is still part of the document:

<exampleOfACDATA>
<![CDATA[
    Since this is a CDATA section
    I can use all sorts of reserved characters
    like > < " and &
    or write things like
    <foo></bar>
    but my document is still well formed!
]]>
</exampleOfACDATA>

Try saving the following as a .xhtml file (not .html) and open it using FireFox (not Internet Explorer) to see the difference between the comment and the CDATA section; the comment won't appear when you look at the document in a browser, while the CDATA section will:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" >
<head>
<title>CDATA Example</title>
</head>
<body>

<h2>Using a Comment</h2>
<div id="commentExample">
<!--
You won't see this in the document
and can use reserved characters like
< > & "
-->
</div>

<h2>Using a CDATA Section</h2>
<div id="cdataExample">
<![CDATA[
You will see this in the document
and can use reserved characters like
< > & "
]]>
</div>

</body>
</html>

Something to take note of with CDATA sections is that they have no encoding, so there's no way to include the string ]]> in them. Any character data which contains ]]> will have to - as far as I know - be a text node instead. Likewise, from a DOM manipulation perspective you can't create a CDATA section which includes ]]>:

var myEl = xmlDoc.getElementById("cdata-wrapper");
myEl.appendChild(xmlDoc.createCDATASection("This section cannot contain ]]>"));

This DOM manipulation code will either throw an exception (in Firefox) or result in a poorly structured XML document: http://jsfiddle.net/9NNHA/

Solution 3:

One big use-case: your xml includes a program, as data (e.g. a web-page tutorial for Java). In that situation your data includes a big chunk of characters that include '&' and '<' but those characters aren't meant to be xml.

Compare:

<example-code>
while (x &lt; len &amp;&amp; !done) {
    print( &quot;Still working, &apos;zzz&apos;.&quot; );
    ++x;
    }
</example-code>

with

<example-code><![CDATA[
while (x < len && !done) {
    print( "Still working, 'zzzz'." );
    ++x;
    }
]]></example-code>

Especially if you are copy/pasting this code from a file (or including it, in a pre-processor), it's nice to just have the characters you want in your xml file, w/o confusing them with XML tags/attributes. As @paary mentioned, other common uses include when you're embedding URLs that contain ampersands. Finally, even if the data only contains a few special characters but the data is very very long (the text of a chapter, say), it's nice to not have to be en/de-coding those few entities as you edit your xml file.

(I suspect all the comparisons to comments are kinda misleading/unhelpful.)

Solution 4:

I once had to use CDATA when my xml element needed to store HTML code. Something like

<codearea>
  <![CDATA[ 
  <div> <p> my para </p> </div> 
  ]]>
</codearea>

So CDATA means it will ignore any character which could otherwise be interpreted as XML tag like < and > etc.

What does <![CDATA[]]> in XML mean?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Related

Recent Posts