PHP to clean-up pasted Microsoft input

HTML Purifier will create standards compliant markup and filter out many possible attacks (such as XSS).

For faster cleanups that don't require XSS filtering, I use the PECL extension Tidy which is a binding for the Tidy HTML utility.

If those don't help you, I suggest you switch to FCKEditor which has this feature built-in.


In my case, this worked just fine:

$text = strip_tags($text, '<p><a><em><span>');

Rather than trying to pull out stuff you don't want such as embedded word xml, you can just specify you're allowed tags.