Parse HTML with PHP's HTML DOMDocument

Solution 1:

If you want to get :

The text
that's inside a <div> tag with class="text"
that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.

For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML
<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}

And executing this gives me the following output :

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

Solution 2:

You can use http://simplehtmldom.sourceforge.net/

It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.

Something like this:

// Find all <div> which have attribute id=text
$ret = $html->find('div[id=text]');

See the documentation of it for more help.

Parse HTML with PHP's HTML DOMDocument

Solution 1:

Solution 2:

Related

Recent Posts