DOM parser that allows HTML5-style </ in <script> tag

I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)

$d = new DOMDocument;
$d->loadXML('<script id="foo"><td>bar</td></script>');
echo $d->saveHTML();

But of course the markup must be error-free for loadXML to work.


Re: html5lib

You click on the download tab and download the PHP version of the parser.

You untar the archive in a local folder

 tar -zxvf html5lib-php-0.1.tar.gz
 x html5lib-php-0.1/
 x html5lib-php-0.1/VERSION
 x html5lib-php-0.1/docs/
 ... etc

You change directories and create a file named hello.php

cd html5lib-php-0.1
touch hello.php 

You place the following PHP code in hello.php

$html = '<html><head></head><body>
<script type="text/x-jquery-tmpl" id="foo">
<table><tr><td>${name}</td></tr></table>
</script> 
</body></html>';
$dom = HTML5_Parser::parse($html); 
var_dump($dom->saveXml()); 
echo "\nDone\n";

You run hello.php from the command line

php hello.php

The parser will parse the document tree, and return a DOMDocument object, which can be manipulated as any other DOMDocument object.


FluentDOM uses the DOMDocument but blocks loading notices and warnings. It does not have an own parser. You can add your own loaders (For example one that uses the html5lib).


I just find out (in my case).

try to change parameters option of loadHTML using LIBXML_SCHEMA_CREATE in DOMDocument

$dom = new DOMDocument;

libxml_use_internal_errors(true);
//$dom->loadHTML($buffer, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$dom->loadHTML($buffer, LIBXML_SCHEMA_CREATE);

I added comment tags (<!-- ... -->) in my jQuery template blocks (CDATA blocks also failed) and DOMDocument did not touch the internal HTML.

Then, before I used the jQuery templates, I wrote a script to remove the comments.

$(function() {
    $('script[type="text/x-jquery-tmpl"]').text(function() {
        // The comment node in this context is actually a text node.
        return $.trim($(this).text()).replace(/^<!--([\s\S]*)-->$/, '$1');
    });
});

Not ideal, but I wasn't sure of a better workaround.