RAID6 resync with fast writes but slow reads

I'm using Debian Jessie.

# uname -a
Linux host 4.9.0-0.bpo.3-amd64 #1 SMP Debian 4.9.30-2+deb9u5~bpo8+1 (2017-09-28) x86_64 GNU/Linux

And have setup a RAID6.

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid6 sda[0] sdd[3] sdc[2] sdb[1]
      19532611584 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 1/73 pages [4KB], 65536KB chunk

This are 4x Seagate Enterprise 10TB 7200rpm. When copying a large file from RAID array to internal system disk (which is a SSD) I get an average throughput of 220MB/s. Copying large files from SSD to array is done with 145MB/s. When the monthly RAID check is done (started by cron job execution checkarray --cron --all --idle --quiet which is the default behaviour) I can see

# cat /proc/mdstat                                                                                                                                                 Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid6 sda[0] sdd[3] sdc[2] sdb[1]
      19532611584 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  check =  0.7% (72485728/9766305792) finish=817.2min speed=197694K/sec
      bitmap: 1/73 pages [4KB], 65536KB chunk

So resync speed is great also. Now, there is some strange behaviour. While check is executed I can write data onto array in parallel with good performance. Write speed is ~100MB/s and one can see the RAID sync speed decreases. After copy to array is done, sync speed increased to previous speed again. The problem are reads from the array while check is running. Readings are done with <20MB/s. And resync speed for RAID does not decrease. I've no idea what's the reason for this.

# ps aux | grep md0
root       211  0.4  0.0      0     0 ?        S    Okt22  93:40 [md0_raid6]
root       648  0.0  0.0      0     0 ?        S    Okt22   0:01 [jbd2/md0-8]
root     15361  4.4  0.0      0     0 ?        DN   02:25   0:00 [md6_resync]
root     15401  0.0  0.0  12752  2040 pts/2    S+   02:26   0:00 grep md6
# ionice -p 211
none: prio 0
# ionice -p 15361
idle

Resync process is set to idle which is correct. I/O scheduler is set to CFQ for all underlaying physical discs.

There is a RAID1 in this system also

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid1 sde[0] sdf[1]
      3906887488 blocks super 1.2 [2/2] [UU]
      bitmap: 2/30 pages [8KB], 65536KB chunk

This array has no problem. I can write to and read from array with good speeds while check is running. Watching /proc/mdstat, the sync speed decreases on I/O activity and increases after finished again. But not for read I/O on md0/RAID6. As for md0 normal sync speed is very good, normal reads and writes without resync are good and even writes to array while running RAID check is very good, why are reads so bad when monthly check is running?


Let me start with saying that I have no real idea about mdadm or Debian - I think however that the effect you're seeing is a very general one.

The normally slower write than read speed is to be expected when you look at how RAID 6 works: with reads, all four disks can be read from simultanuously. The parity data is skipped and instead, the next data segment can be read ahead to cache. The best read speed that can be achieved is n times the speed of a single disk.

On write, data is augmented by two different parity segments that need to be written to disk as well. When all disks write at the same time, the best speed to be achieved is n-2 times the single speed.

A RAID 6 resync or rebuild with few, large disks will take a long time. Essentially, each stripe has to be read and compared to the redundancy data also stored on the disks. Disks are heavily loaded and any productive I/O has to compete with all the reads. This is why reads are slow. To have decent read latency the background sync needs to run at a low priority, i.e. it needs to stop and pause for a moment when other I/O is sensed.

Writes on the other hand go to cache first and will seem to be finished right away - as long as there's cache available. The real write will happen in the background at some time or other. Only when your write amount exceeds the caching capacity you will notice a serious slowdown.

To get read speed to a decent level during resync you need to make the background check run at a slow pace to start with or figure out a way to make it pause when productive reads are done.