Best way to make sure a MySQL database is fully in UTF8

Solution 1:

Rory, First of all, you are correct for wanting to monitor what gets created in you databases. While we all implement steps to prevent mistakes, you cannot assume that the mistakes won't creep in. I do a very similar thing as most of our infrastructure demands UTF8.

The following queries are good for checking stats:

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.SCHEMATA GROUP BY DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME;
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_COLLATION, COUNT(0) AS COUNT FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL GROUP BY TABLE_COLLATION;
# to filter schema use TABLE_SCHEMA in the where clause
SELECT CHARACTER_SET_NAME, COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME IS NOT NULL AND COLLATION_NAME IS NOT NULL GROUP BY CHARACTER_SET_NAME, COLLATION_NAME;
# to filter schema use TABLE_SCHEMA in the where clause

The following queries would be good from a cron job that emails you if any results are found:

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE DEFAULT_CHARACTER_SET_NAME NOT LIKE '%utf8%' OR DEFAULT_COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL AND TABLE_COLLATION NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME NOT LIKE '%utf8%' OR COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause

These queries work for MySQL >= 5.0. If you need help writing the cron job, let me know.

I also have a problem with my developers (mostly the Ruby on Rails team) creating InnoDB tables when they don't need them. To keep this under wraps I monitor the staging database with a cron that calls:

SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'InnoDB';

Solution 2:

I'd do something like mysqldump --all-databases --no-data | grep -i character

Solution 3:

Jure1873 suggests using grep, and I commented that awk was more appropriate. Here is what I threw together.

mysqldump --all-databases --no-data | awk '{gsub("[`;]","");};/^USE /{db=$2;};/CREATE TABLE /{table=$3;};/^\) ENGINE/{sub(".*CHARSET=","");if(!match($1,"utf8")){print db ":" table ":" $1;}}'

Here it is with increased readability:

mysqldump --all-databases --no-data | \
awk '
    {
        gsub("[`;]","");
    };
    /^USE /{
        db=$2;
    }; \
    /CREATE TABLE /{
        table=$3;
    }; \
    /^\) ENGINE/{
        sub(".*CHARSET=","");
        if(!match($1,"utf8")){
            print db ":" table ":" $1; 
        }   
    }   
'