Does RAID 1 protect against corruption?

RAID-1 protects against the complete failure of one of the two drives. If the drive is not marked as failed, then its contents are assumed to be accurate. But if, for whatever reason, one of the two drives was returning inconsistent data, then that error would not be detected by the RAID system, and the application would get bad data.

Many controllers have a verification process that runs periodically, but the purpose of this is to test for disk failure, not data integrity. Hard drives implement their own data integrity tests and checksums which they use to spot bad sectors, but the algorithm is designed to be fast and compact, not thorough, so errors can leak through.

While data corruption is the exception rather than the rule, it's also not unheard-of. A member of the ZFS team, for example, reported in an interview seeing corrupt data being dished to them by their high-end RAID-5 device which they spotted by virtue of the fact that ZFS implements checksums at that filesystem level.

As others have noted, a raid1 system has no way to tell which of two sectors is bad.

Higher end raid systems run a scrub operation in the background to compare both copies, and flag differences. Better yet is a system that reads both blocks from the drive each time, and compares them at read time. Resolving those differences however is impossible for the raid controller.

On Unix systems under mdadm, a scrub check can be initiated with the "sync_action":

md arrays can be scrubbed by writing either check or repair to the file md/sync_action in the sysfs directory for the device.

Requesting a scrub will cause md to read every block on every device in the array, and check that the data is consistent. For RAID1 and RAID10, this means checking that the copies are identical. For RAID4, RAID5, RAID6 this means checking that the parity block is (or blocks are) correct.

raid1 is all about protecting from sudden total drive failure. Look elsewhere for protection against corruption. Beyond that Raid1 offers no "history", so can't recover from human or software error. Look to filesystems like ZFS or a history preserving filesystem like Hammer for protecting against corruption.

It depends on where the corruption stems from. If a drive in a RAID 1 mirror is screwey and is writing nonsense then the RAID mirror will degrade and the good drive will be in use and you'll have the good files. In the case of RAID 5 this is done with 2 data drives and a parity drive (in simplest form) and if one of the 3 drives is failing to write proper files then it will fail out and you'll be left with either 2 data drives or 1 data drive and a parity drive.

Now lets look at what happens if the corruption is caused by a virus or a bug in a program. In RAID 1 and RAID 5 no drive will be taken out of service because the drives are writing properly. Nothing has failed. However files will be destroyed because the virus or bug is writing junk, and it will write it to both your drives in a RAID 1 mirror, and to all 3 of your drives in a RAID 5 system.

That is why RAID is not backup. It prevents the most likely failure which is a disk failure but it doesn't account for a lot of other scenarios.

In practice, yes. The vast majority of hard drive failures occur all-or nothing. Either (a) the cable is unplugged or the drive microcontroller have failed, so the RAID controller gets no response at all -- obvious failed drive. Or (b) The cable and drive microcontroller are good, but when it tries to read a sector, the internal drive microcontroller detects data corruption because the internal ECC checksum failed, and repeated attempts to read that sector (in case it's a temporary read glitch) eventually time out, so the RAID controller gets a polite "sorry" response -- obvious failed drive. Either way, it is obvious to the RAID-1 or RAID-5 controller that the drive has failed.

In principle, no. If something has gone so badly wrong that a hard drive is writing nonsense, and yet somehow working well enough to write the correct internal ECC code for that nonsense, then RAID-1 can't tell which drive is correct. The RAID-1 system will likely overwrite the good data with the corrupt data on a resync. RAID-5 is no better. The "RAID-5 write hole" power failure during active writing is one particular rare but not impossible case.

As far as I know, the only way to avoid such corruption is to use end-to-end checksums in addition to file mirroring, either automatically as part of the file system (ZFS or Btrfs) or periodically or manually (recalculating rsync checksums, simple file verification, Parchive file sets, etc.); ideally with a cryptographic hash such as SHA-256.

Does RAID 1 protect against corruption?

Related

Recent Posts