In MySQL, which collation should I choose?
Solution 1:
Collation tells database how to perform string matching and sorting. It should match your charset.
If you use UTF-8, the collation should be utf8_general_ci
. This will sort in unicode order (case-insensitive) and it works for most languages. It also preserves ASCII and Latin1 order.
The default collation is normally latin1
.
Solution 2:
Collation is not actually the default, it's giving you the default collation as the first choice.
What we're talking about is collation, or the character set that your database will use in its text types. Your default option is usually based on regional settings, so unless you're planning to globalize, that's usually peachy-keen.
Collations also determine case and accent sensitivity (i.e.-Is 'Big' == 'big'? With a CI, it is). Check out the MySQL list for all the options.
Solution 3:
Short answer: always use utf8mb4
(specifically utf8mb4_unicode_ci
) when dealing with collation in MySql & MariaDB.
Long answer:
MySQL’s utf8 encoding is awkwardly named, as it’s different from proper UTF-8 encoding. It doesn’t offer full Unicode support, which can lead to data loss or security vulnerabilities.
Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode.
Read the full text here: https://mathiasbynens.be/notes/mysql-utf8mb4
As to which specific utf8mb
to choose, go with utf8mb4_unicode_ci
so that sorting is always handled properly with minimal/unnoticeable performance drawbacks. See more details here: What's the difference between utf8_general_ci and utf8_unicode_ci