MD Raid 1 read balancing algorithm

Solution 1:

Linux implementation of RAID1 speeds up disk read operations as long as two separate disk read operations are performed at a time. That means reading one file won't be any faster on RAID1 than on single disk, but reading two distinct files in the same time will be faster.

Read test done with DD and read cache disabled:

Test single file:
1048576000 copied @ 224MB/s

Test same file 2 transfers:
Test single file:
1048576000 copied @ 116MB/s
1048576000 copied @ 104MB/s

Test 2 files 2 transfers:
1048576000 copied @ 212MB/s
1048576000 copied @ 217MB/s

As for the options, LUKS on top of single MD device sounds more logical.

The problem with your reads that happen from the same disk could be tweaked with best_dist_disk and best_pending_disk parameters. You can see a complete example here.

Solution 2:

If you only have a single stream of sequential I/O the md RAID1 algorithm will keep picking the same disk. From the mdadm man page:

[On md RAID1] a single stream of sequential input will not be accelerated (e.g. a single dd), but multiple sequential streams or a random workload will use more than one spindle. In theory, having an N-disk RAID1 will allow N sequential threads to read from all disks.

You can read the source code for the 5.10 kernel to see the md RAID1 balancing algorithm. The rough overview is:

  • Balancing can only happen in regions where there are multiple disks that are in-sync and are not faulty
  • Balancing will try and avoid disks marked as "write mostly"
  • If a disk's last I/O finished exactly before the area that wants to be read, balancing will continue with that disk unless the new I/O's size is too big (this is why a single sequential stream is not accelerated)
  • If the above didn't give us a disk, check if any of the disks is an SSD or if any of the disks has no pending I/O. If either of these cases is true, pick the disk with the least amount of pending I/O.
  • If the above didn't give us disk, pick the disk that most recently did I/O closest to the desired read's location