how to escape xml entities in javascript?
In JavaScript (server side nodejs) I'm writing a program which generates xml as output.
I am building the xml by concatenating a string:
str += '<' + key + '>';
str += value;
str += '</' + key + '>';
The problem is: What if value
contains characters like '&'
, '>'
or '<'
?
What's the best way to escape those characters?
or is there any javascript library around which can escape XML entities?
Solution 1:
HTML encoding is simply replacing &
, "
, '
, <
and >
chars with their entity equivalents. Order matters, if you don't replace the &
chars first, you'll double encode some of the entities:
if (!String.prototype.encodeHTML) {
String.prototype.encodeHTML = function () {
return this.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
};
}
As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value
only
Conversely if you want to decode HTML entities1, make sure you decode &
to &
after everything else so that you don't double decode any entities:
if (!String.prototype.decodeHTML) {
String.prototype.decodeHTML = function () {
return this.replace(/'/g, "'")
.replace(/"/g, '"')
.replace(/>/g, '>')
.replace(/</g, '<')
.replace(/&/g, '&');
};
}
1 just the basics, not including ©
to ©
or other such things
As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape
method to perform this functionality.
Solution 2:
This might be a bit more efficient with the same outcome:
function escapeXml(unsafe) {
return unsafe.replace(/[<>&'"]/g, function (c) {
switch (c) {
case '<': return '<';
case '>': return '>';
case '&': return '&';
case '\'': return ''';
case '"': return '"';
}
});
}
Solution 3:
If you have jQuery, here's a simple solution:
String.prototype.htmlEscape = function() {
return $('<div/>').text(this.toString()).html();
};
Use it like this:
"<foo&bar>".htmlEscape();
-> "<foo&bar>"
Solution 4:
you can use the below method. I have added this in prototype for easier access. I have also used negative look-ahead so it wont mess things, if you call the method twice or more.
Usage:
var original = "Hi&there";
var escaped = original.EncodeXMLEscapeChars(); //Hi&there
Decoding is automaticaly handeled in XML parser.
Method :
//String Extenstion to format string for xml content.
//Replces xml escape chracters to their equivalent html notation.
String.prototype.EncodeXMLEscapeChars = function () {
var OutPut = this;
if ($.trim(OutPut) != "") {
OutPut = OutPut.replace(/</g, "<").replace(/>/g, ">").replace(/"/g, """).replace(/'/g, "'");
OutPut = OutPut.replace(/&(?!(amp;)|(lt;)|(gt;)|(quot;)|(#39;)|(apos;))/g, "&");
OutPut = OutPut.replace(/([^\\])((\\\\)*)\\(?![\\/{])/g, "$1\\\\$2"); //replaces odd backslash(\\) with even.
}
else {
OutPut = "";
}
return OutPut;
};
Solution 5:
Caution, all the regexing isn't good if you have XML inside XML.
Instead loop over the string once, and substitute all escape characters.
That way, you can't run over the same character twice.
function _xmlAttributeEscape(inputString)
{
var output = [];
for (var i = 0; i < inputString.length; ++i)
{
switch (inputString[i])
{
case '&':
output.push("&");
break;
case '"':
output.push(""");
break;
case "<":
output.push("<");
break;
case ">":
output.push(">");
break;
default:
output.push(inputString[i]);
}
}
return output.join("");
}