Fastest method to escape HTML tags as HTML entities?

I'm writing a Chrome extension that involves doing a lot of the following job: sanitizing strings that might contain HTML tags, by converting <, > and & to &lt;, &gt; and &amp;, respectively.

(In other words, the same as PHP's htmlspecialchars(str, ENT_NOQUOTES) – I don't think there's any real need to convert double-quote characters.)

This is the fastest function I have found so far:

function safe_tags(str) {
    return str.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;') ;
}

But there's still a big lag when I have to run a few thousand strings through it in one go.

Can anyone improve on this? It's mostly for strings between 10 and 150 characters, if that makes a difference.

(One idea I had was not to bother encoding the greater-than sign – would there be any real danger with that?)


Here's one way you can do this:

var escape = document.createElement('textarea');
function escapeHTML(html) {
    escape.textContent = html;
    return escape.innerHTML;
}

function unescapeHTML(html) {
    escape.innerHTML = html;
    return escape.textContent;
}

Here's a demo.


You could try passing a callback function to perform the replacement:

var tagsToReplace = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;'
};

function replaceTag(tag) {
    return tagsToReplace[tag] || tag;
}

function safe_tags_replace(str) {
    return str.replace(/[&<>]/g, replaceTag);
}

Here is a performance test: http://jsperf.com/encode-html-entities to compare with calling the replace function repeatedly, and using the DOM method proposed by Dmitrij.

Your way seems to be faster...

Why do you need it, though?


Martijn's method as a prototype function:

String.prototype.escape = function() {
    var tagsToReplace = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;'
    };
    return this.replace(/[&<>]/g, function(tag) {
        return tagsToReplace[tag] || tag;
    });
};

var a = "<abc>";
var b = a.escape(); // "&lt;abc&gt;"

An even quicker/shorter solution is:

escaped = new Option(html).innerHTML

This is related to some weird vestige of JavaScript whereby the Option element retains a constructor that does this sort of escaping automatically.

Credit to https://github.com/jasonmoo/t.js/blob/master/t.js


The fastest method is:

function escapeHTML(html) {
    return document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML;
}

This method is about twice faster than the methods based on 'replace', see http://jsperf.com/htmlencoderegex/35 .

Source: https://stackoverflow.com/a/17546215/698168