mdadm 3-way RAID 1 - good solution for guaranteed 2-drive failure tolerance?
To have a single array capable of 2-disk failure, you have two choices:
- three-way RAID1, as you suggested
- RAID6, as another possibility.
What is the best choice? It depends of what you are trying to achieve.
- if you want a setup that give you the possibility of take out a disk, install it on another computer and still be capable of reading your data, use RAID1.
- if you want to be able to expand your array and gain additional space each time, use RAID6
A note about RAID1 performance degradation: it does not depend on bus congestion, rather on how mean disk seek time is influenced by multiple writes. Disk seek time is composed of two different parts: seek latency (the time the head need to reach the correct angle) and rotational delay (the time the disk platter need to rotate to the correct position).
When multiple disks are involved it multiple, identical writes, the rotational delay as measured by the host will be the worst of all the involved disks. Seek time, on the other hand, should be relatively similar between RAID1-ed disks. In the end, this means that RAID1 arrays will have slightly lower write IOPS values vs a single identical disk.
Linux's mdadm has an interesting provision to minimize the impact of different disk's latency. For example, read the man page about "write-behind" and "write-mostly":
-W, --write-mostly subsequent devices listed in a --build, --create, or --add command will be flagged as 'write-mostly'. This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link
--write-behind= Specify that write-behind mode should be enabled (valid for RAID1 only). If an argument is specified, it will set the maximum number of outstanding writes allowed. The default value is 256. A write-intent bitmap is required in order to use write-behind mode, and write-behind is only attempted on drives marked as write-mostly.
Note that this will lower your random read IOPS performance (as some disk will be effectively used for write only), so be careful choosing your poison.
Yes, you can add as many mirrors to a RAID1 as you like, and you can tolerate failures of all but 1 device. If you add 10 devices, you can tolerate a failure of 9 devices.
Don't forget there will be a write penalty for this setup though. All data has to be written to every device. Generally it should be fairly insignificant but if all devices are on the same controller/bus then you may start to notice the delays as your data is written to every device. For example, with 3 devices, writing 1mb of data to the array requires the controller/bus to actually write 3mb to disk.
Another solution is raid 6 with 3 disks. See this post:
Minimum number of disks to implement RAID6
Raid 6 will also allow for doubling capacity by adding a fourth drive. I have had 2 drives fail on an array and not lost data.
First, I think it's of importance to note the usage scenario and the quality of components used. It's not the same if you're using desktop HDDs and cheap raid controllers or going full enterprise hardware.
If the only thing you're doing is replication across HDDs (RAID1) then you can afford to lose n-1 hard drives and still have all the data intact.
But I'd really like to know what is your usage scenario and hardware selection that you're so concerned losing 2 drives simultaneously?
Recently, I've setup a webserver for a ISP. Server had a 6 port RAID controller. So I've set up RAID 60 as a good tradeoff between speed / security.
I advise you to read through this link
In regard to your clarification, I strongly suggest going for either RAID 5 or RAID 60... Alternatively, if cost is the issue, Simple RAID0 with two-tier offsite backup would be enough.
My references are my own experiences setting up numerous servers in vastly different usage scenarios.
I have always been a big fan of hardware-based RAID 5. I typically use Ubuntu Linux for the server if the planned use allows. With the hardware based RAID, Ubuntu (as well as any other operating system) has no trouble booting from a RAID-5 array in most modern servers. I also use multiple backups. The first backup is an hourly backup at the server on an external drive using Back-In-time to provide an on-site backup every hour during business hours. The second level backup is a nightly backup of the network share drives using another computer running Ubuntu and Back-In-Time. The nightly backups are also made to portable USB drives and at least one is kept off-site. Drives are rotated daily during the business week. The third level backup is to a retired Windows Vista computer running Ubuntu Linux, configured similarly to the server configuration where each night the server files are synchronized to the backup system using the Linux utility rsync. RAID-5 (with a hot spare) was good the last few years when there were drive failures. The failed drive (hot-swappable) was replaced in each instance without interrupting network activities. RAID-5 didn't help when the server experienced a hard crash, probably from motherboard or memory failure. What did help was the spare backup server that had the synchronized files from after close of business the previous night. I have a small script that I ran to migrate the server configuration to the backup server, which migrates all the user and machine accounts, making the spare computer a temporary PDC. It took a couple hours to put together another retired Windows computer to make a new backup computer system and put it online. I opted to replace the more expensive Proliante ML350 server with a more modest Proliant ML10 server. I will be configuring the new server with RAID-1 as a 3-drive mirror with a hot spare. The ML10 server I have ordered uses a software RAID controller which has to be configured as AHCI instead of RAID for Ubuntu to boot. The total cost for the server and four 1TB drives is about the cost for one 300GB drive on the ML350. This is the second time in 25 years of managing servers RAID-5 didn't help (the first time was probably a failure of the RAID controller). Both instances are not a problem of RAID, just a problem resulting from using technology.
The main point I want to get across is be prepared for when your server failure occurs and have a good backup plan. To be a good backup plan, you have to actually test the backup and recovery procedures. In the case of the most recent failure, to total time from I got the call getting me out of bed, getting dressed, grabbing a quick bite to eat while going out the door, driving to the site (10 minute drive), diagnosing the problem (including an attempted restart of the server), and getting the backup server online was 52 minutes.
You can have discussions on which is better of the different possibilities of RAID. Just keep in mind that more things can fail other than hard drives. Use the type of RAID that you think best for your use, but plan for recovery because of hardware failure or a malware/virus attack.