RegExp to strip HTML comments
Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).
IN:
fkdshfks khh fdsfsk
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<!--eg1-->
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
<!--gc2-->
<!--bXNnYm94-->
<!--egc2-->
<!--g2-->
</div>
<!--eg2-->
fdsfdskh
to this OUT:
fkdshfks khh fdsfsk
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
</div>
fdsfdskh
Thanks.
Solution 1:
Are you just trying to remove the comments? How about
s/<!--[^>]*-->//g
or the slightly better (suggested by the questioner himself):
<!--(.*?)-->
But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.
Solution 2:
preg_replace('/<!--(.*)-->/Uis', '', $html)
This PHP code will remove all html comment tags from the $html string.
Solution 3:
A better version would be:
(?=<!--)([\s\S]*?)-->
It matches html comments like these:
<!--
multi line html comment
-->
or
<!-- single line html comment -->
and what is most important it matches comments like this (the other regex shown by others do not cover this situation):
<!-- this is my blog: <mynixworld.inf> -->
Note
Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.
<!--[if !(IE 8) ]><!-->