Wisdom of adding USB external Drive to Linux RAID10 array

I have a SAN system with 10 drive slots setup with Software RAID10, and all the md0-5 added into a single volume group. The SATA port in slot 10 recently failed and will not accept any drive we put in there. I'm extremely nervous about leaving drive 9 unmirrored. My proposed solution is to add a USB external drive (matching size and manufacturer to drive #9) to the server and assign that as the RAID1 partner for #9. I realize that USB is going to be much slower than SATA, but I am more concerned about data protection than drive speed.

Does anyone see any issues with that plan (other than performance)?

cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdj1[1] 976759936 blocks [2/1] [U_]

md3 : active raid1 sdc1[1] sda1[0] 976759936 blocks [2/2] [UU]

md2 : active raid1 sdh1[1] sdg1[0] 976759936 blocks [2/2] [UU]

md4 : active raid1 sdi1[0] sde1[1] 976759936 blocks [2/2] [UU]

md0 : active raid1 sdf1[0] sdb1[1] 976759936 blocks [2/2] [UU]


Solution 1:

RAID10 is a RAID0 of RAID1 arrays you would end up with just one volume in the end, so you would have one physical volume to give to LVM. Like so:

            LV1        LV2              
             \__________\___________....
                            |
                           VG
                            |
                           PV
                            |
     ______________________MD5________________________
    /             /           |          \            \
  _MD0_        _MD1_        _MD2_       _MD3_        _MD4_        
 /     \      /     \      /     \     /     \      /     \
D01   D02    D03   D04    D05   D06   D07   D08    D09   D10

What you describe with "all the md0-5 added into a single volume group" sounds like 5 separate RAID1 (or RAID10 - the RAID10 driver essentially acts as RAID1 for arrays of two drives) arrays which you have added to LVM separately, so you have a volume group consisting of 5 physical volumes. Like so:

            LV1        LV2              
             \__________\___________....
                            |
     ______________________VG_________________________
    /             /           |          \            \
   PV1          PV2          PV3         PV4          PV5
    |            |            |           |            |
  _MD0_        _MD1_        _MD2_       _MD3_        _MD4_        
 /     \      /     \      /     \     /     \      /     \
D01   D02    D03   D04    D05   D06   D07   D08    D09   D10

(this isn't actually RAID10 (RAID-1-then-0) it is RAID-1-then-JBOD)

Is this the case?

If so then you could instead just remove PV5 from the volume group, assuming there is enough free space in the system in total and the filesystems you have support being resized (i.e. et2/3/4 with resize2fs) if needed:

  1. Reduce the filesystems and the logical volumes that contain them until there is at least enough free space in the volume group to fill PV5, unless there is already enough free space in the volume group.
  2. Use pvmove to move all block allocated to that physical volume by LVM to others
  3. (optional) Use vgreduce to remove that PV5 from the volume group

Now the broken array is not part of the LVM setup. You can add it back once you have fixed the situation so that RAID1 pair is no longer running degraded.

To actually answer you question...

Other than performance issues, which you've already identified, and the chance of a USB drive being accidentally disconnected (which is unlikely if the machine that hosts your SAN is safely out of the way of humans an other disturbances) I see no problem with replacing your disk 10 with one connected via USB.

If the machine that hosts your SAN has a spare PCI or PCI-E slot, I would instead suggest taking that route, adding an extra SATA controller to hang the drive off. If you get a controller that offers five ports (or can fit in two cards that offer five or more in total) I would be tempted to split the drives up so each pair has one drive connected to the motherboard and one connected to the add-on controller - that way your whole array has more chance of surviving a motherboard controller failure that kills all the drives attached to it (a very very rare occurrence, but it could happen).

In either case, if you do have five separate arrays each as a physical volume to LVM (not as one array so one PV in LVM), I would recommend getting the data off the degraded pair at least temporarily unless you can add the replacement drive right now.

(To confirm the layout you have, it would be worth rewording your question and/or adding the output of the commands cat /proc/mdstat, pvs, vgs and lvs.)

Solution 2:

It's a raid 10, I'd be less concerned about the array's health with one disk dead than by using a USB drive. If it had been raid 5 it might be a different matter but I think you'll be fine without a tenth disk until you get around to fixing your controller - so long as you're sorting that out soon - you are right :)