Strip Tags and everything in between
As you’re dealing with HTML, you should use an HTML parser to process it correctly. You can use PHP’s DOMDocument and query the elements with DOMXPath, e.g.:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//h1') as $node) {
$node->parentNode->removeChild($node);
}
$html = $doc->saveHTML();
Try this:
preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', '<h1>including this content</h1>');
Example:
echo preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', 'Hello<h1>including this content</h1> There !!');
Output:
Hello There
If you want to strip ALL tags and including content:
$yourString = 'Hello <div>Planet</div> Earth. This is some <span class="foo">sample</span> content!';
$regex = '/<[^>]*>[^<]*<[^>]*>/';
echo preg_replace($regex, '', $yourString);
#=> Hello Earth. This is some content!
HTML attributes can contain <
or >
. So, if your HTML gets too messy this method will not work and you'll need a DOM parser.
Regular Expression Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[^<]* any character except: '<' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
> '>'