DL380 G5, RAID5, ext3, RAID Failed

I'm sorry. But this is operator error.

You had two failing disks on a RAID5 array, and you removed more disks than the array could sustain.

Doing this without any backups is the bigger mistake.

You should contact a data recovery firm to attempt to retrieve the data from the broken Logical Drive.

Do not power the system back on again. Shut it down, call a data recovery service. There are a number of services that allow for remote recovery of this type of failure. At this point, all you can do is make it worse.

This often involves connecting all drives directly to a known-good HBA (not a RAID card or other controller!) and starting a specific downloadable linux image with remote management tools. The company then remotely accesses the system, assesses the disk status, and recovers any RAID metadata left. Using proprietary software, they can re-assemble a virtual RAID disk (technical detail: often something that plugs into the standard Linux device-mapper system). This then exposes the RAID read-only in-software (with no RAID SoC accelerator). Next steps are verifying the data is not corrupted beyond use and cloning the virtual disk to a new disk to complete the data recovery. After that you can worry about getting the system back up and running.

While I'm not going to name any services here, most of them are easy to find, and for the ones with remote services (saving you the roundtrip of shipping the RAID drives + recovery drive to them and waiting for the recovery + clone and then them sending it back) you get the benefit of the data never actually leaving your facility.

A small amount of good news: as long as the RAID controller (or you) didn't write any new data to any of the disks, and the pre-fail warning is not a fail-warning, there is practically a 99.9999% chance a good data recovery team can restore all of it, and reasonably quick too.

Re: restoring the old drives.

Since your RAID is completely dead as it stands, you have little to lose by refitting the two pre-fail drives.

Do install them in the original bays.

Remember they're pre-fail not failed outright, so there is a fair chance they will run for long enough to rescue your data.

There is a chance the raid simply won't come up, and a small chance the controller will ask to "reset" the raid (choose NO/CANCEL) and a tiny chance the raid controller might automatically reset the raid which would negate any value added by a data recovery firm.

So your utter top priority if the RAID comes up, is to get the data off. That means having at least 1.2 TB space available and ready to copy data off, and a tool like robocopy or xcopy32 or in your linux case rsync ready to run. You don't want to waste time reading man pages and figuring out the syntax if your drives are wasting their last minutes.

Once your data is safe, then recreate the raid as a raid6 with the new drives. You'll drop 300GB of capacity, but gain a two-drive tolerance. Or add an additional drive and consider a raid10 over 6 drives. Or consider retiring this machine completely; the G5 is over 10 years old and is really not suitable for important production tasks any more.

And not trying to put the boot in, but do set up a proper backup solution too. There will be a next time.

DL380 G5, RAID5, ext3, RAID Failed

Related

Recent Posts