IO Wait causing so much slowdown (EXT4 JDB2 at 99% IO ) During Mysql Commit

I am writing an indexer, using python, which indexes documents and insert them into Database, Before it was single process but now i made it to multiprocessing with 4 parallel processes running.After every text extraction , it insert into database and does a commit.

Now it hitting IO problem , the main IO Problem is not my process but EXT4's jdb2 , journeling system. It is at 99.99% and casuing CPU to wait for IO at every MySQL Commit.

I saw many having that problem on the internet and their solution is to mount using barrier = 0 . Would that disable Journaling totally ? My Servers have UPS and tempting to do it , should i ?


Put the database on a non-journaling file system. At least larger servers (oracle, sql server) have their own journal function (transaction log) and optimize their IO accordingly. You have log and database on separate file systems and discs and rely on database internal functionality for handling bad IO. There are normally no (larger setup) file system changes except write date anyway because files do not expand - they would be generated with their "final" size (ok, admins can change that), and changes are as I said tracked by the database level transaction log.

You may also want to tell us what your hardware layer is. Most people underestimate that IOPS is the limiting factor for a database and think a small disc set is a proper environment for a large database. While some of us work on databases using a larger number of discs, thus potentially supporting a higher number of IOPS.


There is always going to be a trade off between resiliency and performance.

With MySQL on ext4 the barriers=1 default does indeed cause a slow down, however the first action should not be to disable journaling or to turn on data=writeback.

First, if resiliency is of high importance, a battery backed RAID is certainly well worth it.

The mount options I have chosen, especially on non-battery backed RAID are:

/dev/mapper/vg-mysql--data  /var/lib/mysql/data ext4  defaults,noatime,nodiratime,barrier=1,data=ordered  0 0

This is intentionally not using data=writeback because I do not want to risk filesystem corruption resulting in "old data to appear in files after a crash and journal recovery" (quote is from man mount).

The ideal configuration in my.cnf for full resiliency around I/O related settings are:

[mysqld]
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1

I have opted for the following sequence of trade-offs to increase performance:

  1. sync_binlog = 0: this is the first MySQL config that I change away from the full resiliency. The reason for this is that it gives a significant performance improvement, especially where binlog_format=row (unfortunately required for Jira). I am using enough MySQL replicas in the cluster that if the binlog were to become corrupted by a power loss scenario I would do a binary copy from another replica.
  2. innodb_flush_log_at_trx_commit = 2: While a value of 1 is required for full ACID compliance, with a value of 2 "the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues." (quote from MySQL docs)
  3. Update the mount options to use data=writeback. Note that if this is your root file system you will also need to pass a kernel command line option. I put together a few steps on that at coderwall.
  4. Test various values of innodb_flush_method. O_DIRECT is shown to improve performance in some workloads, but it's not a given that this will work in your environment.
  5. Upgrade to SSDs, in which case you'll also want to increase innodb_io_capacity, and tune settings such as innodb_adaptive_flushing, innodb_read_io_threads, innodb_write_io_threads, innodb_purge_threads, and other possible settings.