Is there a MySQL performance benchmark to measure the impact of utf8_unicode_ci versus utf8_general_ci?
Solution 1:
Well, I did not find any benchmarks in the Internet, so I decided to made benchmarks myself.
I created a very simple table with 500000 rows:
CREATE TABLE test(
ID INT(11) DEFAULT NULL,
Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Then I filled it with random data by running this stored procedure:
CREATE PROCEDURE randomizer()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE random CHAR(20) ;
theloop: loop
SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);
INSERT INTO test VALUES (i+1, random);
SET i=i+1;
IF i = 500000 THEN
LEAVE theloop;
END IF;
END LOOP theloop;
END
Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):
CREATE benchmark_simple_select()
BEGIN
DECLARE i INT DEFAULT 0;
theloop: loop
SELECT * FROM test WHERE Description = 'test' COLLATE utf8_general_ci;
SET i = i + 1;
IF i = 30 THEN
LEAVE theloop;
END IF;
END LOOP theloop;
END
CREATE PROCEDURE benchmark_select_like()
BEGIN
DECLARE i INT DEFAULT 0;
theloop: loop
SELECT * FROM test WHERE Description LIKE '%test' COLLATE utf8_general_ci;
SET i = i + 1;
IF i = 30 THEN
LEAVE theloop;
END IF;
END LOOP theloop;
END
CREATE PROCEDURE benchmark_order_by()
BEGIN
DECLARE i INT DEFAULT 0;
theloop: loop
SELECT * FROM test WHERE ID > FLOOR(1 + RAND() * (400000 - 1)) ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;
SET i = i + 1;
IF i = 10 THEN
LEAVE theloop;
END IF;
END LOOP theloop;
END
In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.
I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.
Here are the results:
benchmark_simple_select() with utf8_general_ci: 9957 ms
benchmark_simple_select() with utf8_unicode_ci: 10271 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.
benchmark_select_like() with utf8_general_ci: 11441 ms
benchmark_select_like() with utf8_unicode_ci: 12811 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.
benchmark_order_by() with utf8_general_ci: 11944 ms
benchmark_order_by() with utf8_unicode_ci: 12887 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.
Solution 2:
I didn't see any benchmark, but you can run your own using the BENCHMARK function:
BENCHMARK(count,expr)
As advised by Matthew you may run a parallel installation of MYSQL, but consider that there could be a huge difference between different architecture (sparc, intel, 32bit, 64bit, ...).