What's the right way to decode a string that has special HTML entities in it? [duplicate]
Solution 1:
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Solution 2:
Don’t use the DOM to do this. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results.
For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.
Here’s how you’d use it:
he.decode("We're unable to complete your request at this time.");
→ "We're unable to complete your request at this time."
Disclaimer: I'm the author of the he library.
See this Stack Overflow answer for some more info.
Solution 3:
If you don't want to use html/dom, you could use regex. I haven't tested this; but something along the lines of:
function parseHtmlEntities(str) {
return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
var num = parseInt(numStr, 10); // read num as normal number
return String.fromCharCode(num);
});
}
[Edit]
Note: this would only work for numeric html-entities, and not stuff like &oring;.
[Edit 2]
Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/
Solution 4:
jQuery will encode and decode for you.
function htmlDecode(value) {
return $("<textarea/>").html(value).text();
}
function htmlEncode(value) {
return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
$("#encoded")
.text(htmlEncode("<img src onerror='alert(0)'>"));
$("#decoded")
.text(htmlDecode("<img src onerror='alert(0)'>"));
});
</script>
<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>