PHP/PDO/MySQL: inserting into MEDIUMBLOB stores bad data

I have a simple PHP web app that accepts icon images via file upload and stores them in a MEDIUMBLOB column.

On my machine (Windows) plus two Linux servers, this works fine. On a third Linux server, the inserted image is corrupted: unreadable after a SELECT, and the length of the column data as reported by the MySQL length() function is about 40% larger than the size of the uploaded file.

(Each server connects to a separate instance of MySQL.)

Of course, this leads me to think about encoding and character set issues. BLOB columns have no associated charsets, so it seems like the most likely culprit is PDO and its interpretation of the parameter value for that column.

  • I've tried using bindValue with PDO::PARAM_LOB, to no effect.
  • I've verified that the images are being received on the server correctly (i.e. am reading them post-upload with no problem), so it's definitely a DB/PDO issue.
  • I've searched for obvious configuration differences between the servers, but I'm not an expert in PHP configuration so I might have missed something.

The insert code is pretty much as follows:

$imagedata = file_get_contents($_FILES["icon"]["tmp_name"]);
$stmt = $pdo->prepare('insert into foo (theimage) values (:theimage)');
$stmt->bindValue(':theimage', $imagedata, PDO::PARAM_LOB);
$stmt->execute();

Any help will be really appreciated.

UPDATE: The default MySQL charset on the problematic server is utf8; it's latin1 on the others.

The problem is "solved" by adding PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES latin1 COLLATE latin1_general_ci" to the PDO constructor.

This seems like a bug poor design to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?

Note that the DB tables are defined as latin1 in all cases: it's only the servers' default charsets that are inconsistent.


Solution 1:

This seems like a bug to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?

I do not think that this must be a bug. I can imagine that whenever the client talks with the server and says that the following command is in UTF-8 and the server needs it in Latin-1, then the query might get re-encoded prior parsing and execution. So this is an encoding issue for the transportation of the data. As the whole query prior parsing will get influenced by this re-encoding, the binary data for the BLOB column will get changed as well.

From the Mysql manual:

What character set should the server translate a statement to after receiving it?

For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

Or on the way back: Latin1 data from the store will get converted into UTF-8 because the client told the server that it prefers UTF-8 for the transportation.

The identifier for PDO itself you name looks like being something entirely different:

PDO::PARAM_LOB tells PDO to map the data as a stream, so that you can manipulate it using the PHP Streams API. (Ref)

I'm no MySQL expert but I would explain it this way. Client and server need to negotiate which charsets they are using and I assume they do this for a reason.