Wisdom of adding USB external Drive to Linux RAID10 array
I have a SAN system with 10 drive slots setup with Software RAID10, and all the md0-5 added into a single volume group. The SATA port in slot 10 recently failed and will not accept any drive we put in there. I'm extremely nervous about leaving drive 9 unmirrored. My proposed solution is to add a USB external drive (matching size and manufacturer to drive #9) to the server and assign that as the RAID1 partner for #9. I realize that USB is going to be much slower than SATA, but I am more concerned about data protection than drive speed.
Does anyone see any issues with that plan (other than performance)?
cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdj1[1] 976759936 blocks [2/1] [U_]
md3 : active raid1 sdc1[1] sda1[0] 976759936 blocks [2/2] [UU]
md2 : active raid1 sdh1[1] sdg1[0] 976759936 blocks [2/2] [UU]
md4 : active raid1 sdi1[0] sde1[1] 976759936 blocks [2/2] [UU]
md0 : active raid1 sdf1[0] sdb1[1] 976759936 blocks [2/2] [UU]
Solution 1:
RAID10 is a RAID0 of RAID1 arrays you would end up with just one volume in the end, so you would have one physical volume to give to LVM. Like so:
LV1 LV2
\__________\___________....
|
VG
|
PV
|
______________________MD5________________________
/ / | \ \
_MD0_ _MD1_ _MD2_ _MD3_ _MD4_
/ \ / \ / \ / \ / \
D01 D02 D03 D04 D05 D06 D07 D08 D09 D10
What you describe with "all the md0-5 added into a single volume group" sounds like 5 separate RAID1 (or RAID10 - the RAID10 driver essentially acts as RAID1 for arrays of two drives) arrays which you have added to LVM separately, so you have a volume group consisting of 5 physical volumes. Like so:
LV1 LV2
\__________\___________....
|
______________________VG_________________________
/ / | \ \
PV1 PV2 PV3 PV4 PV5
| | | | |
_MD0_ _MD1_ _MD2_ _MD3_ _MD4_
/ \ / \ / \ / \ / \
D01 D02 D03 D04 D05 D06 D07 D08 D09 D10
(this isn't actually RAID10 (RAID-1-then-0) it is RAID-1-then-JBOD)
Is this the case?
If so then you could instead just remove PV5 from the volume group, assuming there is enough free space in the system in total and the filesystems you have support being resized (i.e. et2/3/4 with resize2fs) if needed:
- Reduce the filesystems and the logical volumes that contain them until there is at least enough free space in the volume group to fill PV5, unless there is already enough free space in the volume group.
- Use
pvmove
to move all block allocated to that physical volume by LVM to others - (optional) Use
vgreduce
to remove that PV5 from the volume group
Now the broken array is not part of the LVM setup. You can add it back once you have fixed the situation so that RAID1 pair is no longer running degraded.
To actually answer you question...
Other than performance issues, which you've already identified, and the chance of a USB drive being accidentally disconnected (which is unlikely if the machine that hosts your SAN is safely out of the way of humans an other disturbances) I see no problem with replacing your disk 10 with one connected via USB.
If the machine that hosts your SAN has a spare PCI or PCI-E slot, I would instead suggest taking that route, adding an extra SATA controller to hang the drive off. If you get a controller that offers five ports (or can fit in two cards that offer five or more in total) I would be tempted to split the drives up so each pair has one drive connected to the motherboard and one connected to the add-on controller - that way your whole array has more chance of surviving a motherboard controller failure that kills all the drives attached to it (a very very rare occurrence, but it could happen).
In either case, if you do have five separate arrays each as a physical volume to LVM (not as one array so one PV in LVM), I would recommend getting the data off the degraded pair at least temporarily unless you can add the replacement drive right now.
(To confirm the layout you have, it would be worth rewording your question and/or adding the output of the commands cat /proc/mdstat
, pvs
, vgs
and lvs
.)
Solution 2:
It's a raid 10, I'd be less concerned about the array's health with one disk dead than by using a USB drive. If it had been raid 5 it might be a different matter but I think you'll be fine without a tenth disk until you get around to fixing your controller - so long as you're sorting that out soon - you are right :)