Why URE fails raid rebuild and "renders RAID 5 unusable" [closed]

The problem is not lazy manufacturers or ancient technology. It it a misunderstanding in the goal of RAID. *1. The goal of RAID is to keep the filesystem usable after a disk dies. Not to replace a backup of guarantee a succesfull rebuild.


Let me expand on that with a practical example:
You are the IT guy for an office with 100 people. You need to build a fileserver for them.

Now if you used a single disk for that and the disk died then 100 people would be picking their nose until you replaced the disk and restored the backups. And you would need to backup quite often (e.g. every day).

Now you use RAID. The single disk dies but the array remains available in a degraded state. All files are still accessible and everybody can continue working. At 8 PM *2 you run a new set of backups, shut down the server, replace the broken disk and restore the data. Either with a rebuild or from backup. Everybody can continue to work and no data is lost.


Now there are a few assumptions here:

  1. You do have backups. Really, you should have them since RAID will not protect against some things like server theft, lightning, fire, ...
    RANT OVER.
  2. A disk rebuild can take a long time when you have large disks. This was fine with old 80MB drives with server qualifications. If you use huge (multi TB) consumer drives it will take long time. Restoring from backup might be faster. For this reason alone you need to consider making and testing backups when you work with a 40TB array.

Note that occasionally a sector on a disk will fail. This is a fact of life. If happens rarely and drives have a way to work around this (reallocating sectors, also see TLER). If you have huge disks and you try to rebuild them then you are reading a huge amount of sectors. The chances of running into an URE are small but non-zero. If this happens fall back to backups.


*1: RAID as is RAID1 (mirror), RAID 5, RAID 6, or a combination like RAID10.

*2Or whenever everyone has gone home. An email with "emergency maintenance at 5PM!" would help here


No, the RAID manufacturers are not dumb or lazy.

To put it as simply as possible: If you're trying to rebuild data (especially from parity, as-in RAID5 for example), and there's an Unrecoverable Read Error while reading the source you're building from, then it's impossible to properly rebuild the array from that corrupted source.