Is mysql_real_escape_string() broken?
Solution 1:
From the MySQL’s C API function mysql_real_escape_string
description:
If you need to change the character set of the connection, you should use the
mysql_set_character_set()
function rather than executing aSET NAMES
(orSET CHARACTER SET
) statement.mysql_set_character_set()
works likeSET NAMES
but also affects the character set used bymysql_real_escape_string()
, whichSET NAMES
does not.
So don’t use SET NAMES
/SET CHARACTER SET
but PHP’s mysql_set_charset
to change the encoding as that is the counterpart to MySQL’s mysql_set_character_set
(see source code of /ext/mysql/php_mysql.c).
Solution 2:
However, even with legacy code and old server versions, the vulnerability can only be triggered if the character set of the database connection is changed from a single-byte one like Latin-1 to a multibyte one that allows the value 0x5c (ASCII single quote) in the second or later byte of a multibyte character.
Specifically, UTF-8 does not allow that, unlike older Asian encodings like GBK and SJIS. So if your application does not change the connection character set, or changes it only to UTF-8 or single-byte ones like Latin-n, you're safe from this exploit.
But best practice is still to run the newest server version, use the correct interface to change character sets, and use prepared queries so you don't forget to escape stuff.
Solution 3:
In the comments there is a link to a bugfix in mySQL 5.0.22 (24 May 2006), where this has been addressed.