RAID 5 detects write errors?

I have seen people recommend RAID 10 over RAID 5 for databases due to RAID 10 giving better performance and a better chance of recovering from a hardware failure.

This confuses me as I thought the purpose of using RAID 5 was more a matter of the parity allowing the detecting and correcting write errors to ensure the integrity of the data. My understanding was that RAID 10 can not recover from write errors. I.e. if a bit has an error, it will be the opposite of the bit in the mirrored drive, and thus it will be impossible to tell which bit is the one with the error, and which is the correct one.

However, I tried googling along the lines of detect "write error" with raid 5 vs raid 10 to see if anyone covered this point, and came up empty handed.

Am I making this all up in my head?

Can a RAID 5 array detect and recover from write errors using the 3 parity bit? Or does the detection not occur until much later when the data is read and the parity indicates an error?

If a RAID 10 array has a write error, will it be able to determine which of the mirrored bits is the one in error? I.e. the drive indicates a read failure for that particulor bit, or does it just see the bits do not match and since there is no parity it can't determine which is in error?

I see some discussion of rebuilds being triggered by a read error. Do write errors not get detected until later when the data is being read? In other words, does the writer error occur, but the erroneous data just sits there until possibly much later when the data is read and the parity indicates an error. Is that why you are at risk of getting additional read errors during rebuild, in that you could be writing a large amount of data with errors but the errors will not be detected until the next time the data is read?

I would like to clarify that tape backups do not address the above question. If you have a scenario where data integrity is very important, and you can't detect write errors, then all the tape backups in the world won't help you if the data you are backing up already has errors.


Solution 1:

I believe the case you are worried about is the one where there is a failed write that the drive does not report. This is a critical failure of a drive, so manufacturers strive to make sure it never happens. The storage stack is built on the assumption that the terminal storage device will report both read and write errors.

I have seen some specialist systems perform a read immediately after a write to ensure the data really was comitted, but not in the last 10 years.

To answer your question, neither RAID handles the stated error better than the other.

Where they do differ is handling write errors reported by the device. R5 reponds in a vendor specific way; it could re commit the most recent write with parity compute. With R1 the mirror pair that did not return a write error can be assumed to be correct, and that one block copied from the good member to the bad member.

Solution 2:

Neither can record from write errors unless the RAID vendor is doing some sort of checksum process. RAID is to prevent against disk failure. In RAID 5 when a disk is replaced the parity information is used to rebuild the missing data. In a RAID 10 when a disk is replaced the data is copied from the partner disk.

As to if RAID 10 can support more disk failures, it can depending on which disk fails second. RAID 10 is basically a bunch of RAID 1 array's striped together. If the second disk that fails is the partner of the first one which failed (which is possible if the first disk failed because of a hotspot of data) then you'll loose all the data when the second disk fails because the stripe is now broken. Where with RAID 5 is any disk fails as the second disk you've lost the array.

In either case backups to tape are mandatory for anything that you can't afford to loose.