Inserting UTF-8 encoded string into UTF-8 encoded mysql table fails with "Incorrect string value"

Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO

I have a 𝄎 character, in a string that mb_detect_encoding claims is UTF-8 encoded. I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8

Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).

Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.

Edit 3: Data is denormalized over two tables:

SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";

>+--------------------+
| character_set_name |
+--------------------+
| utf8               |
| utf8               |
+--------------------+

Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.

I don't understand how that can fail with "Incorrect string value".


Solution 1:

𝄎 (U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .

Solution 2:

to solve this issue, first you change your database field to utf8m4b charset. For example:

ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL; 

then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO

$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');

or in zend framework 1.2

$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
            'password' => 'password', 'dbname' => 'db_name',
            'driver_options' => array(
                '1002' => "SET NAMES 'utf8mb4'",
                '12'    => 0 //this is not necessary
            )
        );

Solution 3:

In your PDO connecton, set the charset.

new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);

Solution 4:

I fixed the error: SQLSTATE[HY000]: General error: 1366 Incorrect string value ...... with this method:

I use utf8mb4_unicode_ci for database database Set utf8mb4_unicode_ci for all tables tables

Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content) fields

It is okay now. If you use laravel, continue to edit config/database.php

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',

laravel

If you use function strtolower, replace it with mb_strtolower Notice: you have to put <meta charset="utf-8"> on your head tag