Will HTML Encoding prevent all kinds of XSS attacks?

I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.

Is there some way to do an XSS attack even if HTML Encode is used?


No.

Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.

For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.

Next, consider the following pseudocode:

<input value=<%= HtmlEncode(somevar) %> id=textbox>

Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to

a onclick=alert(document.cookie)

the resulting output is

<input value=a onclick=alert(document.cookie) id=textbox>

which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.

There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).

Also don't forget about UTF-7 type attacks - where the attack looks like

+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-

Nothing much to encode there...

The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.

If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.

Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.

Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...

Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html


If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See @Avid's post for more details)

In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.

You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.

It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).


If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.

So you might want to check things on the input side too. And you might want to check stuff you read from the database.