How to strip HTML tags from string in JavaScript? [duplicate]

cleanText = strInputCode.replace(/<\/?[^>]+(>|$)/g, "");

Distilled from this website (web.achive).

This regex looks for <, an optional slash /, one or more characters that are not >, then either > or $ (the end of the line)

Examples:

'<div>Hello</div>' ==> 'Hello'
 ^^^^^     ^^^^^^
'Unterminated Tag <b' ==> 'Unterminated Tag '
                  ^^

But it is not bulletproof:

'If you are < 13 you cannot register' ==> 'If you are '
            ^^^^^^^^^^^^^^^^^^^^^^^^
'<div data="score > 42">Hello</div>' ==> ' 42">Hello'
 ^^^^^^^^^^^^^^^^^^          ^^^^^^

If someone is trying to break your application, this regex will not protect you. It should only be used if you already know the format of your input. As other knowledgable and mostly sane people have pointed out, to safely strip tags, you must use a parser.

If you do not have acccess to a convenient parser like the DOM, and you cannot trust your input to be in the right format, you may be better off using a package like sanitize-html, and also other sanitizers are available.


Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:

  • Your HTML is valid within a <div> element. HTML contained within <body> or <html> or <head> tags is not valid within a <div> and may therefore not be parsed correctly.
  • textContent (the DOM standard property) and innerText (non-standard) properties are not identical. For example, textContent will include text within a <script> element while innerText will not (in most browsers). This only affects IE <=8, which is the only major browser not to support textContent.
  • The HTML does not contain <script> elements.
  • The HTML is not null
  • The HTML comes from a trusted source. Using this with arbitrary HTML allows arbitrary untrusted JavaScript to be executed. This example is from a comment by Mike Samuel on the duplicate question: <img onerror='alert(\"could run arbitrary JS here\")' src=bogus>

Code:

var html = "<p>Some HTML</p>";
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";