Check if a string is html or not
A better regex to use to check if a string is HTML is:
/^/
For example:
/^/.test('') // true
/^/.test('foo bar baz') //true
/^/.test('<p>fizz buzz</p>') //true
In fact, it's so good, that it'll return true
for every string passed to it, which is because every string is HTML. Seriously, even if it's poorly formatted or invalid, it's still HTML.
If what you're looking for is the presence of HTML elements, rather than simply any text content, you could use something along the lines of:
/<\/?[a-z][\s\S]*>/i.test()
It won't help you parse the HTML in any way, but it will certainly flag the string as containing HTML elements.
Method #1. Here is the simple function to test if the string contains HTML data:
function isHTML(str) {
var a = document.createElement('div');
a.innerHTML = str;
for (var c = a.childNodes, i = c.length; i--; ) {
if (c[i].nodeType == 1) return true;
}
return false;
}
The idea is to allow browser DOM parser to decide if provided string looks like an HTML or not. As you can see it simply checks for ELEMENT_NODE
(nodeType
of 1).
I made a couple of tests and looks like it works:
isHTML('<a>this is a string</a>') // true
isHTML('this is a string') // false
isHTML('this is a <b>string</b>') // true
This solution will properly detect HTML string, however it has side effect that img/vide/etc. tags will start downloading resource once parsed in innerHTML.
Method #2. Another method uses DOMParser and doesn't have loading resources side effects:
function isHTML(str) {
var doc = new DOMParser().parseFromString(str, "text/html");
return Array.from(doc.body.childNodes).some(node => node.nodeType === 1);
}
Notes:
1. Array.from
is ES2015 method, can be replaced with [].slice.call(doc.body.childNodes)
.
2. Arrow function in some
call can be replaced with usual anonymous function.
A little bit of validation with:
/<(?=.*? .*?\/ ?>|br|hr|input|!--|wbr)[a-z]+.*?>|<([a-z]+).*?<\/\1>/i.test(htmlStringHere)
This searches for empty tags (some predefined) and /
terminated XHTML empty tags and validates as HTML because of the empty tag OR will capture the tag name and attempt to find it's closing tag somewhere in the string to validate as HTML.
Explained demo: http://regex101.com/r/cX0eP2
Update:
Complete validation with:
/<(br|basefont|hr|input|source|frame|param|area|meta|!--|col|link|option|base|img|wbr|!DOCTYPE).*?>|<(a|abbr|acronym|address|applet|article|aside|audio|b|bdi|bdo|big|blockquote|body|button|canvas|caption|center|cite|code|colgroup|command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frameset|head|header|hgroup|h1|h2|h3|h4|h5|h6|html|i|iframe|ins|kbd|keygen|label|legend|li|map|mark|menu|meter|nav|noframes|noscript|object|ol|optgroup|output|p|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video).*?<\/\2>/i.test(htmlStringHere)
This does proper validation as it contains ALL HTML tags, empty ones first followed by the rest which need a closing tag.
Explained demo here: http://regex101.com/r/pE1mT5
zzzzBov's answer above is good, but it does not account for stray closing tags, like for example:
/<[a-z][\s\S]*>/i.test('foo </b> bar'); // false
A version that also catches closing tags could be this:
/<[a-z/][\s\S]*>/i.test('foo </b> bar'); // true
Here's a sloppy one-liner that I use from time to time:
var isHTML = RegExp.prototype.test.bind(/(<([^>]+)>)/i);
It will basically return true
for strings containing a <
followed by ANYTHING
followed by >
.
By ANYTHING
, I mean basically anything except an empty string.
It's not great, but it's a one-liner.
Usage
isHTML('Testing'); // false
isHTML('<p>Testing</p>'); // true
isHTML('<img src="hello.jpg">'); // true
isHTML('My < weird > string'); // true (caution!!!)
isHTML('<>'); // false
As you can see it's far from perfect, but might do the job for you in some cases.