Loop over DOMDocument
I am following the suggestion from this question Robust, Mature HTML Parser for PHP, about parsing html that may be malformed with DOMDocument.
Is there any easy way to loop over the parsed document? So I would like to loop over html like this.
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
???
foreach (??? as $node)
{
print $node->nodeName.':'.$node->nodeValue;
}
And get results somewhat like this.
ul:
li:value1
li:value2
li:value3
p:subvalue
p:hello world
Using $doc->childNodes
by itself doesn't really do what I want. Since it doesn't seem to go down to lower branches in the tree. I used the code suggested by halfdan and I get results like this.
html:
html:value1
value1
value3
subvalue
hello world
Try this:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
function showDOMNode(DOMNode $domNode) {
foreach ($domNode->childNodes as $node)
{
print $node->nodeName.':'.$node->nodeValue;
if($node->hasChildNodes()) {
showDOMNode($node);
}
}
}
I was having issues with elements that had c data, where even elements that didn't have children where returning that they did.
I am not sure why it was.
The work around I found was to change
if($node->hasChildNodes()) {
showDOMNode($node);
}
to
if($node->childNodes->length != 1) {
showDOMNode($node);
}
And the code now works perfectly.
You need to use PHP Simple HTML DOM Parser and the following code:
<?php
require_once 'simplehtmldom/simple_html_dom.php';
function iterateHtmlElements($html)
{
$dom = str_get_html($html);
$dom->set_callback('handleElement');
$dom->__toString();
echo "\n";
}
function handleElement(simple_html_dom_node $elem)
{
if($elem->tag == 'text') {
echo $elem->innertext();
}
else {
echo "\n" . $elem->tag . ": ";
}
}
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
iterateHtmlElements($html);
It works exactly as expected. I checked it with the input you provided and got the following results:
> php test2.php
ul:
li: value1
li: value1
li: value3
p: subvalue
p: hello world