Should I use <![CDATA[...]]> in HTML5?

Solution 1:

The CDATA structure isn't really for HTML at all, it's for XML.

People sometimes use them in XHTML inside script tags because it removes the need for them to escape <, > and & characters. It's unnecessary in HTML though, since script tags in HTML are already parsed like CDATA sections.

Edit: This is where we open that really mouldy old can of worms from 2002 over whether you're sending XHTML as text/html or as application/xhtml+xml like you’re “supposed” to :-)

Solution 2:

From the same page @pst linked to:

Element-specific parsing for script and style tags, Guidance for XHTML-HTML compatibility: "The following code with escaping can ensure script and style elements will work in both XHTML and HTML, including older browsers."

Maximum backwards compatibility:

<script type="text/javascript"><!--//--><![CDATA[//><!--
    ...
//--><!]]></script>

Simpler version, sort of incompatible with "much older browsers":

<script>//<![CDATA[
   ...
//]]></script>

So, CDATA can be used in HTML5, and it's recommended in the official Guidance for XHTML-HTML compatibility.

This useful for polyglot HTML/XML/XHTML pages, which are served as strict application/xml XML during development, but served as text/html HTML5 in production mode for better cross-browser compatibility. Polyglot pages have their benefits; I've used this myself, as it's much easier to debug XML/XHTML5. Google Chrome, for example, will throw an error for invalid XML/XHTML5 (including for example character escaping), whereas the same page served as HTML5 will "just work" also known as "probably work".

Solution 3:

The spec seems to clear up this issue. script and style tags are considered to be "raw text elements." CDATA is not needed or allowed for them. CDATA is only used with "foreign content" - i.e. MathML and SVG. Note that there are some restrictions to what can go in the script tag -- basically you can't put something like var x = '</script>' in there because it will close the tag and needs to be split like pst noted in his answer. http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions

Solution 4:

HTML5-supporting browsers already read the content inside <style> and <script> tags as CDATA (character data). That means they will parse CSS and JavaScript ok, but also ignore any markup characters. Example: HTML comments (<!-- or -->) will be ignored between those tags.

You only need to add the CDATA block inside <style> and <script> tags if you want your HTML5 page to be compatible with XHTML and XML which do not read those tags as CDATA. XML and XHTML parsers will read the <style> and <script> tag content as they do all HTML elements, as PCDATA (i.e. a normal HTML element), meaning the contents are parsed as markup and potentially break with special characters added in between those tags. You can add special CDATA sections between those two tags to support it. Because XML and XHTML parsers reads everything inside elements as potentially more markup, adding CDATA prevents certain characters from being interpreted as XML or other types of character references.

The problem is, most HTML4/HTML5 browsers and parsers don't support adding additional CDATA sections between those tags, so CDATA blocks have to be commented out for those agents if you add them for XHTML/XML support.

Also, note that all HTML comments (<!-- or -->) added inside those tags are ignored by HTML parsers, but implemented by XHTML ones, commenting out CSS and JavaScript for XHTML, when added. Many people in the past would add comment rules between those tags to hide styles and scripts from very old browsers that normally would not understand CSS or Javascript (pre-1998 browsers). But that strategy failed in XHTML without additional code.

So how do you combine all that inside <style> and <script> tags, and should you care?

I am a purist and like my HTML5 content to still be XML/XHTML-friendly, regardless of what markup recommendation I am using. I also like my pages to work in browsers that know CSS and older browsers that do not. So here are two solutions to support all those scenarios and still display your styles and scripts in modern browsers without error. They are totally safe to use in modern HTML5 browsers:

STYLE

<style type="text/css">
    <!--/*--><![CDATA[/*><!--*/

    /* put your styles here */

    /*]]>*/-->
</style>

SCRIPT

<script type="text/javascript">
    <!--//--><![CDATA[//><!--

    // put your scripts here

    //--><!]]>
</script>
  • These two code blocks will allow HTML5 browsers to work normally with CSS and JavaScript but hide them from older browsers that do not support those technologies.

  • XHTML browsers will now parse your CSS and JavaScript as before but not allow special characters like <, >, and & to be interpreted as markup or entities/escaped characters which would generate parsing errors. They are CDATA now.

  • XML parsers of your page will not understand your CSS and JavaScript, of course, but will accept any type of text you add in there and not try and parse them as markup. They are CDATA now.

  • HOW THE EXAMPLES WORK: For modern HTML5-supporting browsers, comment markers <!-- and --> inside script and style tags are treated like CDATA by default inside style and script elements, so are completely ignored. Following that, the CSS and script comments wrap the rest of the top and bottom lines in CSS and script comments, so are removed. This means the top and bottom lines are always safely hidden and ignored in newer HTML5 browsers. Older browsers that do not know scripts or CSS do not treat script and style elements as CDATA-supporting nor understand CSS and script comments, but will understand the HTML comments. So, they will comment out all the CSS and scripts within each of the two elements. The first line HTML comment is applied first(<!--/*-->), then the <![CDATA[/*> block is read which becomes an empty unknown element to them and ignored. The HTML comment that follows hides all the CSS and scripts from there to the end of the block. The final <!]]> is another ignored empty element to them. For XHTML, these parsers do not read the content inside these elements as CDATA but understand the HTML comments. So, they remove the first comment block. <![CDATA[ next starts the CDATA block for them, wrapping around all styles and scripts inside the tags till ]]> is read. Everything inside the CDATA block is interpreted like HTML5 parsers do now - as normal CSS and scripts - to the XHTML parser rather than as HTML markup, as before. All CSS and script comments also apply. Because XHTML knows CSS and scripting, it still parses those correctly now. XML parsers work the same as XHTML using these rules, except not knowing CSS and script comments inside the CDATA blocks, they just interpret everything as plain character text within the elements.