How to get Open Graph Protocol of a webpage by php?

PHP has a simple command to get meta tags of a webpage (get_meta_tags), but this only works for meta tags with name attributes. However, Open Graph Protocol is becoming more and more popular these days. What is the easiest way to get the values of opg from a webpage. For example:

<meta property="og:url" content=""> 
<meta property="og:title" content=""> 
<meta property="og:description" content=""> 
<meta property="og:type" content="">

The basic way I see is to get the page via cURL and parse it with regex. Any idea?

Really simple and well done:

Using https://github.com/scottmac/opengraph

$graph = OpenGraph::fetch('http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html');
print_r($graph);

Will return

OpenGraph Object

(
    [_values:OpenGraph:private] => Array
        (
            [type] => article
            [video] => http://www.avessotv.com.br/player/flowplayer/flowplayer-3.2.7.swf?config=%7B%27clip%27%3A%7B%27url%27%3A%27http%3A%2F%2Fwww.avessotv.com.br%2Fmedia%2Fprogramas%2Fpantene.flv%27%7D%7D
            [image] => /wp-content/thumbnails/9025.jpg
            [site_name] => Programa Avesso - Bastidores
            [title] => Bastidores Ã¢Â€ÂœPantene Institute ExperienceÃ¢Â€Â P&G
            [url] => http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html
            [description] => Confira os bastidores do Pantene Institute Experience, da Procter &#038; Gamble. www.pantene.com.br Mais imagens:
        )

    [_position:OpenGraph:private] => 0
)

When parsing data from HTML, you really shouldn't use regex. Take a look at the DOMXPath Query function.

Now, the actual code could be :

[EDIT] A better query for XPath was given by Stefan Gehrig, so the code can be shortened to :

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}
var_dump($rmetas);

Instead of :

$doc = new DomDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    if(!empty($property) && preg_match('#^og:#', $property)) {
        $rmetas[$property] = $content;
    }
}
var_dump($rmetas);

How about:

preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $str, $matches);

So, yes, grab the page with any way you can and parse with regex

How to get Open Graph Protocol of a webpage by php?

Related

Recent Posts