How to use C# to sanitize input on an html page?

We are using the HtmlSanitizer .Net library, which:

  • Is open-source (MIT) - GitHub link
  • Is fully customizable, e.g. configure which elements should be removed. see wiki
  • Is actively maintained
  • Doesn't have the problems like Microsoft Anti-XSS library
  • Is unit tested with the OWASP XSS Filter Evasion Cheat Sheet
  • Is special built for this (in contrast to HTML Agility Pack, which is a parser - not a sanitizer)
  • Doesn't use regular expressions (HTML isn't a regular language!)

Also on NuGet


Based on the comment you made to this answer, you might find some useful info in this question:
https://stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-web-site

Here's a parameterized query example. Instead of this:

string sql = "UPDATE UserRecord SET FirstName='" + txtFirstName.Text + "' WHERE UserID=" + UserID;

Do this:

SqlCommand cmd = new SqlCommand("UPDATE UserRecord SET FirstName= @FirstName WHERE UserID= @UserID");
cmd.Parameters.Add("@FirstName", SqlDbType.VarChar, 50).Value = txtFirstName.Text;
cmd.Parameters.Add("@UserID", SqlDbType.Integer).Value = UserID;

Edit: Since there was no injection, I removed the portion of the answer dealing with that. I left the basic parameterized query example, since that may still be useful to anyone else reading the question.
--Joel


If by sanitize you mean REMOVE the tags entirely, the RegEx example referenced by Bryant is the type of solution you want.

If you just want to ensure that the code DOESN'T mess with your design and render to the user. You can use the HttpUtility.HtmlEncode method to prevent against that!


What about using Microsoft Anti-Cross Site Scripting Library?