Prevent XSS with strip_tags()?

I strongly disagree it's "academically better".

  • It breaks user input (imagine how useless StackOverflow would be for this discussion if they "cleaned" posts from all tags).

  • Text inserted in HTML with only tags stripped will be invalid. HTML requires & to be escaped as well.

  • It's not even safe in HTML! strip_tags() is not enough to protect values in attributes, e.g., <input value="$foo"> might be exploited with $foo = " onfocus="evil() (no <,> needed!)

So the correct solution is to escape data according to requirements of language you're generating. When you have plain text and you're generating HTML, you should convert text to HTML with htmlspecialchars() or such. When you're generating e-mail, you should convert text to quoted-printable format, and so on.


strip_tags itself is not going to be sufficient as it removes perfectly valid, non-HTML content. For instance:

<?php
 echo strip_tags("This could be a happy clown *<:) or a puckered face.\n");
 ....
 echo strip_tags("Hey guys <--- look at this!\n");

Will output:

This could be a happy clown *

And:

Hey guys

Everything after the initial < gets removed. Very annoying for end users! Disallowing reserved HTML characters would be a bad move. And these characters will need to be escaped with htmlentities or a similar function when used inline with HTML.

You need something more advanced that strip_tags - HTML Purifier works great and will allow users to use HTML reserved characters.


As others have mentioned, you can use a combination of strip_tags and htmlspecialchars to protect yourself against XSS.

One bad thing about strip_tags is that it might remove harmless content that the user will not expect. I see techies write stuff like: <edit> foo </edit>, where they fully expect those tags to be seen as is. Also, I've seen "normal" people even do things like <g> for "grin." Again, they will think it's a bug if that doesn't show up.

So personally, I avoid strip_tags in preference for my own parser that allows me to explicitly enable certain safe HTML tags, attributes and CSS, explicitly disable unsafe tags and attributes, and convert any other special character to harmless versions. Thus the text is always seen as one would expect.

If I didn't have that parser at my disposal, I would simply use htmlspecialchars to safely encode the text.


It should, I have never heard of that 0 trick before. But you can always do the strip_tags and then the htmlspecialchars just to be safe. Good practice would be to test this yourself on your application, as you know what type of data you can try and input and test and see if it breaks it. Just search for methods of XSS exploits and use that for your test data. I would check at least weekly for new vulnerabilities and continually test your script to those new exploits that come out.


Need help treating html as plain text within the document? Need to echo the value of an attribute without being vunerable to XSS attacks like <input value="<?php echo '" onkeydown="alert(&quot;XSS&quot;)'; ?>" />?

Use htmlentities().

echo htmlentities('<p>"..."</p>');
// result: &lt;p&gt;&quot;...&quot;&lt;/p&gt;

No strip_tags() required, as this function already replaces < and > with the &lt; and &gt; entities.

What's the difference between htmlentities() and htmlspecialchars() you may ask?

Well, htmlentities() will encode ANY character that has an HTML entity equivalent,

while htmlspecialchars() ONLY encodes a small set of the most problematic characters.