How to implement a possibility for user to post some html-formatted data in a safe way?

I have a textarea and I want to support some simplest formatting for posted data (at least, whitespaces and line breaks).

How can I achieve this? If I will not escape the response and keep some html tags then it'll be a great security hole. But I don't see any other solution which will allow text formatting in browser.

So, I probably should filter user's input. But how can I do this? Are there any ready to use solutions? I'm using JSF so are there any smart component which filters everything except html tags?


Use a HTML parser which supports HTML filtering against a whitelist like Jsoup. Here's an extract of relevance from its site.

Sanitize untrusted HTML

Problem

You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.

Solution

Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.

String unsafe = 
      "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
      // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>

And then to display it with whitespace preserved, apply CSS white-space: pre-wrap; on the HTML element where you're displaying it.

No all-in-one JSF component comes to mind.


Is there some reason why you need to accept HTML instead of some other markup language, such as markdown (which is what StackOverflow uses)?

http://daringfireball.net/projects/markdown/

Not sure what kind of tags you'd want to accept that wouldn't be covered by md or a similar formatting language...