How to improve INSERT INTO ... SELECT locking behavior

In our production database, we ran the following pseudo-code SQL batch query running every hour:

INSERT INTO TemporaryTable
    (SELECT FROM HighlyContentiousTableInInnoDb
     WHERE allKindsOfComplexConditions are true)

Now this query itself does not need to be fast, but I noticed it was locking up HighlyContentiousTableInInnoDb, even though it was just reading from it. Which was making some other very simple queries take ~25 seconds (that's how long that other query takes).

Then I discovered that InnoDB tables in such a case are actually locked by a SELECT! https://www.percona.com/blog/2006/07/12/insert-into-select-performance-with-innodb-tables/

But I don't really like the solution in the article of selecting into an OUTFILE, it seems like a hack (temporary files on filesystem seem sucky). Any other ideas? Is there a way to make a full copy of an InnoDB table without locking it in this way during the copy. Then I could just copy the HighlyContentiousTable to another table and do the query there.


Solution 1:

The answer to this question is much easier now: - Use Row Based Replication and Read Committed isolation level.

The locking you were experiencing disappears.

Longer explaination: http://harrison-fisk.blogspot.com/2009/02/my-favorite-new-feature-of-mysql-51.html

Solution 2:

You can set binlog format like that:

SET GLOBAL binlog_format = 'ROW';

Edit my.cnf if you want to make if permanent:

[mysqld]
binlog_format=ROW

Set isolation level for the current session before you run your query:

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT INTO t1 SELECT ....;

If this doesn't help you should try setting isolation level server wide and not only for the current session:

SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

Edit my.cnf if you want to make if permanent:

[mysqld]
transaction-isolation = READ-UNCOMMITTED

You can change READ-UNCOMMITTED to READ-COMMITTED which is a better isolation level.

Solution 3:

Everyone using Innodb tables probably got use to the fact Innodb tables perform non locking reads, meaning unless you use some modifiers such as LOCK IN SHARE MODE or FOR UPDATE, SELECT statements will not lock any rows while running.

This is generally correct, however there a notable exception – INSERT INTO table1 SELECT * FROM table2. This statement will perform locking read (shared locks) for table2 table. It also applies to similar tables with where clause and joins. It is important for tables which is being read to be Innodb – even if writes are done in MyISAM table.

So why was this done, being pretty bad for MySQL Performance and concurrency ?

The reason is – replication. In MySQL before 5.1 replication is statement based which means statements replied on the master should cause the same effect as on the slave. If Innodb would not locking rows in source table other transaction could modify the row and commit before transaction which is running INSERT .. SELECT statement. This would make this transaction to be applied on the slave before INSERT… SELECT statement and possibly result in different data than on master. Locking rows in the source table while reading them protects from this effect as other transaction modifies rows before INSERT … SELECT had chance to access it it will also be modified in the same order on the slave. If transaction tries to modify the row after it was accessed and so locked by INSERT … SELECT, transaction will have to wait until statement is completed to make sure it will be executed on the slave in proper order. Gets pretty complicated ? Well all you need to know it had to be done fore replication to work right in MySQL before 5.1.

In MySQL 5.1 this as well as few other problems should be solved by row based replication. I’m however yet to give it real stress tests to see how well it performs :)

One more thing to keep into account – INSERT … SELECT actually performs read in locking mode and so partially bypasses versioning and retrieves latest committed row. So even if you’re operation in REPEATABLE-READ mode, this operation will be performed in READ-COMMITTED mode, potentially giving different result compared to what pure SELECT would give. This by the way applies to SELECT .. LOCK IN SHARE MODE and SELECT … FOR UPDATE as well.

One my ask what is if I’m not using replication and have my binary log disabled ? If replication is not used you can enable innodb_locks_unsafe_for_binlog option, which will relax locks which Innodb sets on statement execution, which generally gives better concurrency. However as the name says it makes locks unsafe fore replication and point in time recovery, so use innodb_locks_unsafe_for_binlog option with caution.

Note disabling binary logs is not enough to trigger relaxed locks. You have to set innodb_locks_unsafe_for_binlog=1 as well. This is done so enabling binary log does not cause unexpected changes in locking behavior and performance problems. You also can use this option with replication sometimes, if you really know what you’re doing. I would not recommend it unless it is really needed as you might not know which other locks will be relaxed in future versions and how it would affect your replication.

Solution 4:

Disclaimer: I'm not very experienced with databases, and I'm not sure if this idea is workable. Please correct me if it's not.

How about setting up a secondary equivalent table HighlyContentiousTableInInnoDb2, and creating AFTER INSERT etc. triggers in the first table which keep the new table updated with the same data. Now you should be able to lock HighlyContentiousTableInInnoDb2, and only slow down the triggers of the primary table, instead of all queries.

Potential problems:

  • 2 x data stored
  • Additional work for all inserts, updates and deletes
  • Might not be transactionally sound