Recoverability of Windows 10 storage pool (Software RAID 5)

I'm currently looking into RAID 5 for my growing data for my home PC and some things are still unclear to me. I currently have each of my drives backed up by an additional "backup" drive and I only use custom scripts to back up my data frequently. I currently have 4 drives of data, plus 4 drives to back these up (plus 1 junk data drive without backup). Most of them are external USB drives. I'll soon be running out of storage again so I'm looking at maybe better options.. I've had my current lazy backup strategy going for about 15 years. Unfortunately my main limiting factor is cost and I can't simply buy a whole set of new more-suitable drives to do whatever would be the best option. Just needing double the drives every expansion is already expensive.

Thus I've been looking into RAID 5 and considering setting up three RAID 5 storages using my 9 existing drives (mostly because only 3 drives each have the same size). I would be using the Windows 10 storage pools parity option for this and I'll refer to it as RAID for simplicity.

Questions

1. I read that (obviously) USB drives are not a good option for RAID due to throughput limits, which I understand. But I found nowhere to what extent. Would I expect a major read/write slow down compared to a single drive? Or would the RAID 5 just be "as-slow" as the slowest USB drive in the RAID? Or could there even be a slight speed up?

2. I don't understand the statement I find everywhere that a single URE (unrecoverable read error) in addition to one drive failure can mean a loss of the whole RAID. I've encountered partial drive failures, but never has one of my drives just completely died without warning in the last 15 years. If only some sectors are corrupt, would I not just lose the data on the stripes corresponding to those sectors, as opposed to the whole RAID?

3. Assuming I only have some sectors fail on one drive, how does the rebuilding work? Do I have to completely remove the affected drive and rebuild it all? Or can I use a 4th drive outside the RAID, copy all non-corrupt sectors to it, and only rebuild the corrupt ones, also writing it to that 4th drive? Even more importantly, what about the case when 2 sectors on 2 different drives, but also in 2 different RAID-stripes go corrupt? I don't see why this would be a data loss, other than the Software not being able to handle it. But so far what I read seems to suggest that this is a data loss. If this is the case I don't think I will use RAID, as this seems quite unreasonable..

4. To my understanding the highest risk comes when one drive fails completely, and I have to read all the data from all the other drives to rebuild it. It did not become clear to me how stressful this is. In my current setup, I verify my data every so often by hashing every file on the data drive and the backup drive and comparing it to a list of hashes saved elsewhere. I assume this is just as stressful as rebuilding a RAID drive?

Thanks for any help


I'm not familiar with Storage Spaces specifically, so I'll try to keep this as generic as possible.


  1. I read that (obviously) USB drives are not a good option for RAID due to throughput limits […]

More importantly, USB drives won't be reliable enough. See What would happen when USB devices draw more power than the hub can provide?. RAID makes sense only for reliability and anything but permanent connection is not reliable enough.

  1. […] I've encountered partial drive failures, but never has one of my drives just completely died without warning in the last 15 years.

You were lucky so far, but you shouldn't assume that this will continue. Sudden drive failures are a thing. Also, counter-intuitively, fresh-out-of-the-box drives are more susceptible to failures. See bathtub curve.

If only some sectors are corrupt, would I not just lose the data on the stripes corresponding to those sectors, as opposed to the whole RAID?

That's not what RAID controllers (in hardware of software) usually do. It's assumed that you have backups and failing hard is better than silently ignoring the error because then it's not possible for you to mistakenly assume that everything is fine. This kind of failure is assumed to be an extreme case and you should account for it in your risk assessment. If you're not ready to restore from backups in such case, then you need more redundancy.

  1. Assuming I only have some sectors fail on one drive, how does the rebuilding work? Do I have to completely remove the affected drive and rebuild it all? Or can I use a 4th drive outside the RAID, copy all non-corrupt sectors to it, and only rebuild the corrupt ones, also writing it to that 4th drive?

Under ideal conditions copying good sectors should be effectively no different than rebuilding them under ideal conditions, except rebuilding stresses the array more. In less-than-ideal situation, you'll be copying sectors that look fine but contain corrupted data. You're not able to detect this without comparing with the rest of drives, which is exactly what rebuilding does: it assumes that their data is more likely to be correct than on the failed drive.

Even more importantly, what about the case when 2 sectors on 2 different drives, but also in 2 different RAID-stripes go corrupt? I don't see why this would be a data loss, other than the Software not being able to handle it. But so far what I read seems to suggest that this is a data loss. If this is the case I don't think I will use RAID, as this seems quite unreasonable.

This is a tricky situation and I'm not sure what would happen. I'd test this by setting up a small RAID in a virtual machine and simulating this situation by intentionally corrupting raw disk data. It's not a perfect testing method because corrupted data != unreadable data, but if it works for corruption, it should work for bad sectors. On Linux you could use dm-integrity + intentional corruption to better simulate this scenario.

  1. To my understanding the highest risk comes when one drive fails completely, and I have to read all the data from all the other drives to rebuild it. It did not become clear to me how stressful this is. In my current setup, I verify my data every so often by hashing every file on the data drive and the backup drive and comparing it to a list of hashes saved elsewhere. I assume this is just as stressful as rebuilding a RAID drive?

A rebuild will generally include reading all sectors from all drives, which is more taxing than reading all data from all drives, as you currently do. Plain old RAID doesn't distinguish used space from free space, because the filesystem is layered onto it. Filesystems with integrated RAID, like ZFS and btrfs, do distinguish free space from used space though. They also have more robust error correction than plain RAID. They are non-Windows though and I'm not aware if any Windows-native alternatives exist. A 3rd-party btrfs driver for Windows exists, but I'm not sure how reliable it is.

The additional risk during rebuilds comes from the fact that RAID assumes that drives fail randomly and it's not very likely that two drives will fail in immediate succession. In real life that's not always the case: failures of similar drives tend to be correlated, because they have similar design and similar manufacturing defects or imperfections.


Finally, setting up any form of RAID involves wiping the drives, so you won't be able to set this up without some spares.

If your USB drives are 3.5" models, then you can most likely extract bare internal hard drives from them that can be used like regular SATA drives.

I'd consider buying a NAS and consolidating all your storage into a single, self-contained device. With Synology NASes you can even mix and match different drive sizes in a RAID without wasting extra space in the larger drives. It's a bit magical, but it works reliably, has very nice GUI and it doesn't use any proprietary tech - if all else fails, you can pop these drives into any PC and read them by booting Ubuntu from USB.