Can I increase the read error threshold on Linux software RAID?

I have a two RAID 10 volumes where only one partition on /dev/sda was kicked out of one of the two volumes. Here's /dev/mdstat (in the middle of recovery):

md1 : active raid10 sda3[4] sdd3[3] sdc3[2] sdb3[1] 11719732224 blocks super 1.2 512K chunks 2 near-copies [4/3] [_UUU] [===================>.] recovery = 97.7% (5725121408/5859866112) finish=100.5min speed=22334K/sec

md0 : active raid10 sda2[4] sdd2[3] sdc2[2] sdb2[1] 1043456 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

Based on the following kernel message, I see that the sda3 partition was dropped from md1 after only 21 read errors:

Apr 17 14:25:05 someserver kernel: md/raid10:md1: sda3: Raid device exceeded read_error threshold [cur 21:max 20]

Apr 17 14:25:05 someserver kernel: md/raid10:md1: sda3: Failing raid device

Well, based on my research, it may actually be that these errors occurred within a short period of time and this is not all errors over "all time".

Seeing that smartctl reports zero re-allocated sectors on the physical 6TB drive, I don't think the drive needs to be thrown out yet and I've got multiple copies of the data this server is storing.

That being the case, I went ahead and re-added the partition to the md1 array and at 3 days into the repair, it's almost complete (the system is mirroring another system at the same time as the repair and so it's a very busy server, which is slowing down the repair). I'm concerned that as soon as this drive or one of the other 3 6TB drives in the array runs into a bad sector that it'll get quickly ejected from the array requiring another repair.

Is there a way to increase the read_error threshold above 20 so that it tries harder before failing the device?


I am way too late, but this was almost impossible to find without help, so I'll post the solution.

Set it via sysfs

The threshold is configurable from here:

/sys/block/md*/md/max_read_errors

So you can set it to 50 on md1 device like:

# echo 50 > /sys/block/md1/md/max_read_errors

Source

I got the answer from user frostschutz on #linux channel on libera.chat.

It is also listed on https://access.redhat.com/solutions/5249861

But I haven't found the official documentation of MD RAID or the kernel mentioning any of this.