Rebuild RAID5 with uncorrectable sectors on multiple disks
My software RAID5 (mdadm) system consists of five disks. Recently, I get I/O errors when reading certain files. Most of the other files are still readable.
At first, I was planning to find out which disk is broken (using smartctl) and quickly replace the failed disk to rebuild the array before other disks fail as well. However, smartctl shows that three disks have uncorrectable errors.
I'd think that mdadm should still be able to rebuild as long as the bad sectors of these three disks do not intersect, allowing me the option to swap and rebuild one by one.
Or does the fact that I have an I/O error already indicate that parity is lost and the same sector on multiple disks is unreadable? Is there some way to find out whether or not any failing sectors intersect, and thus information is irreversibly lost?
The standard procedures are:
- Always have a good, up-to-date backup (at least two independent copies in different places, at least on different media)
- Continuously monitor your RAID for problems. A RAID is worthless when errors are allowed to accumulate.
- Scrub disks at least monthly. This avoids errors to accumulate and to prevent rebuilds.
- Consider RAID 6 with two redundant disks.
You don't seem to have taken this seriously. Try to recover what's still there now. Trying to rebuild that nearly failed array might case more damage than you expect.
If the data is valuable enough, find a trustworthy and capable data recovery service. Put aside a four to five digit amount of cash. Otherwise, rinse & repeat - replace disks, reformat, reinstall and take the standard procedures more seriously.
-
You are correct in that if unreadable sectors "don't intersect", i.e. lying in different stripes, MD RAID may recover data using parity. But it may kick some drive out during recovery, and then chances will decline significantly.
-
There is a general rule of data recovery: always begin with a raw dump. This guarantees you unlimited attempts: if you mess something up, you can start again with the dump. So in general, you can clone all dying disks to some working ones, reading through the errors, and then assemble RAID out of new disks.
-
You may start with cloning each drive sector-by-sector to a replacement with
ddrescue
(i.e. not by using MD RAID recovery procedure). In addition to copying through errors, it creates what it calls a log file, which is actually the bad sector map. When you clone all three of them, you may compare those maps and find out if there are any intersections. Don't throw them away, these maps may help you during the recovery. -
However, RAID5 is very nasty beast in the sense of such dumps. What could go wrong? If your drive's sector doesn't read at all throwing I/O error, RAID layer will recover that data from other disks; that would be case for old disks. But if it reads without errors, but returns wrong data, RAID won't try to recover it from parity and return that wrong data instead.
ddrescue
will fill unreadable sectors with zeros, which will be read back if you assemble array with this clone device later, so this will translate to reading zeros (corrupted data) where it was potentially possible to recover original data. RAID doesn't guarantee the data integrity. And this is the real problem for all variants except RAID6 which has two parity syndroms or RAID1 with more than two mirrors. And, you may already have guessed, this problem manifests itself in most disruptive way in case of RAID5. (There is additional consideration against it, something about modern disk sizes and their bit error rates.) -
During any cloning operation a disk may die completely. Then you stuck. There is possibility to do recovery beyond this point, but it will cost you much. There are services where are "clean rooms" and they can e.g. replace heads inside hard disks and re-try reading it; it is slow, error-prone and they likely charge you quite a lot. Consider this if your data is very valuable.
-
Therefore, it is wise to clone original disks, but then put clones away, assemble array from original disks and try to clone from array itself (
/dev/mdX
). If something goes wrong (disk dies), replace it with clone and manually recover broken stripes (read p.4) afterwards, consulting with log files (p.3) This is quite hard work. Notice also, that you need to spare twice the original space to perform the recovery. Or don't do anything yourself, outsource the whole work to specialists. This is price you pay for improper maintenance of the array and the data. -
And, now, you have this precious experience. Don't blame arrays, blame yourself, learn the lesson and manage them correctly:
- Think three times before using RAID5. Then say "no" and go for another RAID level.
- Scrub the array regularly. This means the MD RAID will read and compare data on drives and it will ring bell if something is wrong (mismatch, unreadable block). Then you may replace bad behaving drive on early symptoms. Good distros have this configured out of the box (Debian at least).
- Monitor the disks and array, to not to miss important signs of problems.
- Finally, welcome to the club of administrators who regularly back up their data.