Parse HTML with PHP's HTML DOMDocument
Solution 1:
If you want to get :
- The text
- that's inside a
<div>
tag withclass="text"
- that's, itself, inside a
<div>
withclass="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName
-- which will return all tags that have a specific name (while you only want some of them).
Instead, I would use an XPath query on your document, using the DOMXpath
class.
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath
class :
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::query
method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
Solution 2:
You can use http://simplehtmldom.sourceforge.net/
It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.
Something like this:
// Find all <div> which have attribute id=text
$ret = $html->find('div[id=text]');
See the documentation of it for more help.