Preventing XSS in Node.js / server side javascript
Any idea how one would go about preventing XSS attacks on a node.js app? Any libs out there that handle removing javascript in hrefs, onclick attributes,etc. from POSTed data?
I don't want to have to write a regex for all that :)
Any suggestions?
I've created a module that bundles the Caja HTML Sanitizer
npm install sanitizer
http://github.com/theSmaw/Caja-HTML-Sanitizer
https://www.npmjs.com/package/sanitizer
Any feedback appreciated.
One of the answers to Sanitize/Rewrite HTML on the Client Side suggests borrowing the whitelist-based HTML sanitizer in JS from Google Caja which, as far as I can tell from a quick scroll-through, implements an HTML SAX parser without relying on the browser's DOM.
Update: Also, keep in mind that the Caja sanitizer has apparently been given a full, professional security review while regexes are known for being very easy to typo in security-compromising ways.
Update 2017-09-24: There is also now DOMPurify. I haven't used it yet, but it looks like it meets or exceeds every point I look for:
-
Relies on functionality provided by the runtime environment wherever possible. (Important both for performance and to maximize security by relying on well-tested, mature implementations as much as possible.)
- Relies on either a browser's DOM or jsdom for Node.JS.
-
Default configuration designed to strip as little as possible while still guaranteeing removal of javascript.
- Supports HTML, MathML, and SVG
- Falls back to Microsoft's proprietary, un-configurable
toStaticHTML
under IE8 and IE9.
-
Highly configurable, making it suitable for enforcing limitations on an input which can contain arbitrary HTML, such as a WYSIWYG or Markdown comment field. (In fact, it's the top of the pile here)
- Supports the usual tag/attribute whitelisting/blacklisting and URL regex whitelisting
- Has special options to sanitize further for certain common types of HTML template metacharacters.
-
They're serious about compatibility and reliability
- Automated tests running on 16 different browsers as well as three diffferent major versions of Node.JS.
- To ensure developers and CI hosts are all on the same page, lock files are published.
All usual techniques apply to node.js output as well, which means:
- Blacklists will not work.
- You're not supposed to filter input in order to protect HTML output. It will not work or will work by needlessly malforming the data.
- You're supposed to HTML-escape text in HTML output.
I'm not sure if node.js comes with some built-in for this, but something like that should do the job:
function htmlEscape(text) {
return text.replace(/&/g, '&').
replace(/</g, '<'). // it's not neccessary to escape >
replace(/"/g, '"').
replace(/'/g, ''');
}
I recently discovered node-validator by chriso.
Example
get('/', function (req, res) {
//Sanitize user input
req.sanitize('textarea').xss(); // No longer supported
req.sanitize('foo').toBoolean();
});
XSS Function Deprecation
The XSS function is no longer available in this library.
https://github.com/chriso/validator.js#deprecations