PHP & mySQL: When exactly to use htmlentities?

PLATFORM: PHP & mySQL

For my experimentation purposes, I have tried out few of the XSS injections myself on my own website. Consider this situation where I have my form textarea input. As this is a textarea, I am able to enter text and all sorts of (English) characters. Here are my observations:

A). If I apply only strip_tags and mysql_real_escape_string and do not use htmlentities on my input just before inserting the data into the database, the query is breaking and I am hit with an error that shows my table structure, due to the abnormal termination.

B). If I am applying strip_tags, mysql_real_escape_string and htmlentities on my input just before inserting the data into the database, the query is NOT breaking and I am able to successfully able to insert data from the textarea into my database.

So I do understand that htmentities must be used at all costs but unsure when exactly it should be used. With the above in mind, I would like to know:

  1. When exactly htmlentities should be used? Should it be used just before inserting the data into DB or somehow get the data into DB and then apply htmlentities when I am trying to show the data from the DB?

  2. If I follow the method described in point B) above (which I believe is the most obvious and efficient solution in my case), do I still need to apply htmlentities when I am trying to show the data from the DB? If so, why? If not, why not? I ask this because it's really confusing for me after I have gone through the post at: http://shiflett.org/blog/2005/dec/google-xss-example

  3. Then there is this one more PHP function called: html_entity_decode. Can I use that to show my data from DB (after following my procedure as indicated in point B) as htmlentities was applied on my input? Which one should I prefer from: html_entity_decode and htmlentities and when?

PREVIEW PAGE:

I thought it might help to add some more specific details of a specific situation here. Consider that there is a 'Preview' page. Now when I submit the input from a textarea, the Preview page receives the input and shows it html and at the same time, a hidden input collects this input. When the submit button on the Preview button is hit, then the data from the hidden input is POST'ed to a new page and that page inserts the data contained in the hidden input, into the DB. If I do not apply htmlentities when the form is initially submitted (but apply only strip_tags and mysql_real_escape_string) and there's a malicious input in the textarea, the hidden input is broken and the last few characters of the hidden input visibly seen as " /> on the page, which is undesirable. So keeping this in mind, I need to do something to preserve the integrity of the hidden input properly on the Preview page and yet collect the data in the hidden input so that it does not break it. How do I go about this? Apologize for the delay in posting this info.

Thank you in advance.


Solution 1:

Here's the general rule of thumb.

Escape variables at the last possible moment.

You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:

O'Brien
O\'Brien

.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.

$name = "O'Brien";

$sql = "SELECT * FROM people "
     . "WHERE lastname = '" . mysql_real_escape_string($name) . "'";

$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";

You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?

Keep the data clean, and only escape for the specific context of the moment.

Solution 2:

  1. Only before you are printing value(no matter from DB or from $_GET/$_POST) into HTML. htmlentities have nothing to do with database.
  2. B is overkill. You should mysql_real_escape_string before inserting to DB, and htmlentities before printing to HTML. You don't need to strip tags, after htmlentities tags will be displayed on screen as < b r / > e.t.c

Theoretically you may do htmlentities before inserting to DB, but this might make further data processing harder, if you would need original text.

3. See above

Solution 3:

In essence, you should use mysql_real_escape_string prior to database insertion (to prevent SQL injection) and then htmlentities, etc. at the point of output.

You'll also want to apply sanity checking to all user input to ensure (for example) that numerical values are really numeric, etc. Functions such as is_int, is_float, etc. are useful at this point. (See the variable handling functions section of the PHP manual for more information on these functions and other similar ones.)