How to strip a tag and all of its inner html using the tag's id?
I have the following html:
<html>
<body>
bla bla bla bla
<div id="myDiv">
more text
<div id="anotherDiv">
And even more text
</div>
</div>
bla bla bla
</body>
</html>
I want to remove everything starting from <div id="anotherDiv">
until its closing <div>
. How do I do that?
Solution 1:
With native DOM
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="anotherDiv"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
echo $dom->saveHTML();
Solution 2:
You can use preg_replace()
like:
$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);
Solution 3:
Using the native XML Manipulation Library
Assuming that your html content is stored in the variable $html:
$html='<html>
<body>
bla bla bla bla
<div id="myDiv">
more text
<div id="anotherDiv">
And even more text
</div>
</div>
bla bla bla
</body>
</html>';
To delete the tag by ID use the following code:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
// get the tag
$div = $dom->getElementById('anotherDiv');
// delete the tag
if( $div && $div->nodeType==XML_ELEMENT_NODE ){
$div->parentNode->removeChild( $div );
}
echo $dom->saveHTML();
Note that certain versions of libxml
require a doctype
to be present in order to use the getElementById
method.
In that case you can prepend $html with <!doctype>
$html = '<!doctype>' . $html;
Alternatively, as suggested by Gordon's answer, you can use DOMXPath
to find the element using the xpath:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
$xp=new DOMXPath( $dom );
$col = $xp->query( '//div[ @id="anotherDiv" ]' );
if( !empty( $col ) ){
foreach( $col as $node ){
$node->parentNode->removeChild( $node );
}
}
echo $dom->saveHTML();
The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form
, simply replace //div
in //div[ @id="anotherDiv" ]
by '//form
'