Simple HTML sanitizer in Javascript

Solution 1:

You should have a look at the one recommended in this question Sanitize/Rewrite HTML on the Client Side

And just to be sure that you don't need to do more about XSS, please review the answers to this one How to prevent Javascript injection attacks within user-generated HTML

Solution 2:

We've developed a simple HtmlSantizer and opensourced it here: https://github.com/jitbit/HtmlSanitizer

Usage

var result = HtmlSanitizer.SanitizeHtml(input);

[Disclaimer! I'm one of the authors!]

Solution 3:

Here is a 2kb (depends on Snarkdown, which is a 1kb markdown renderer, replace with what you need) vue component that will render escaped markdown, optionally even translating B & I tags for content that may include those tags with formatting...

<template>
  <div v-html="html">
  </div>
</template>

<script>
import Snarkdown from 'snarkdown'
export default {
  props: ['code', 'bandi'],
  computed: {
    html () {
      // Convert b & i tags if flagged...
      const unsafe = this.bandi ? this.code
        .replace(/<b>/g, '**')
        .replace(/<\/b>/g, '**')
        .replace(/<i>/g, '*')
        .replace(/<\/i>/g, '*') : this.code

      // Process the markdown after we escape the html tags...
      return Snarkdown(unsafe
        .replace(/&/g, '&amp;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#039;')
      )
    }
  }
}
</script>

As a comparison, vue-markdown is over 100kb. This won't render math formulas and such, but 99.99% of people won't use it for those things, so not sure why the most popular markdown components are so bloated :(

This is safe to XSS attacks and super fast.

Why did I use &#039; and not &apos;? Because: Why shouldn't `&apos;` be used to escape single quotes?

And now for something completely different, but related...

Not sure why this hasn't been mentioned yet... but your browser can sanitize for you.

Here is the 3-line HTML sanitizer that can sanitize 30x faster than any JavaScript variant by using the assembly language version that comes with your browser... This is used in Vue/React/Angular and many other UI frameworks. Note this does NOT escape HTML, it removes it.

const decoder = document.createElement('div')
decoder.innerHTML = YourXSSAttackHere
const sanitized = decoder.textContent

As proof this method is accepted and fast, here is a live link to the decoder used in Vue.js which uses the same pattern: https://github.com/vuejs/vue/blob/dev/src/compiler/parser/entity-decoder.js

Solution 4:

Another hint: as of May 2021 there is am upcoming Sanitizer API in Firefox.

// our input string to clean
const stringToClean = 'Some text <b><i>with</i></b> <blink>tags</blink>,, including a rogue script <script>alert(1)</script> def.';

const result = new Sanitizer().sanitizeToString(stringToClean);
console.log(result);
// Logs: "Some text <b><i>with</i></b>, including a rogue script def."

(MDN example)

See: https://developer.mozilla.org/en-US/docs/Web/API/HTML_Sanitizer_API

If this feature is accepted by other vendors as well, it might help us get rid of JS-sanitizer-implementations.