Linux software RAID - partition first?

I have two identical drives that I intend to mirror in in the interest of data safety. These are data-only drives, not a primary OS drive.

In such a system, is it better to create single partition (Linux raid auto: type 0xfd) on each drive and raid together the partition from each drive (e.g. /dev/sdb1 and /dev/sdc1)? Or should I instead create a mirrored array of the unpartitioned drives directly (e.g. /dev/sdb and /dev/sdc)?

Ultimately I intend to create an LVM container from the resultant array for storing the actual data. Are there any considerations that might make one or the other option safer or more desireable down the road?


Solution 1:

If you're going to create a mirrored array, you'll use mdadm first to create the mirror, then set up a logical volume to create your physical volume, volume group, and logical volumes. Then, lay a filesystem on top. While this example is in a Kickstart context, it will still illustrate the order of operations:

  • Create the physical partitions to mirror. The example puts five partitions on each of two physical devices, but you can just lay down a single partition on each disk.

  • The "raid pv.01" line uses two partitions to create a mirror pair to use as an LVM physical volume.

  • The remaining lines (volgroup, logvol) create the volume group and logical volumes.

So, how would you do this on a running system? Well, if you're talking about your root and related filesystems, you probably shouldn't. Mirroring those filesystems should really be done at installation time. Otherwise, for running systems:

  1. Start with fdisk or parted (my preference) to create the physical disk partitions.

  2. Then, mirror those partitions as described here. Here is more information on mdadm.

  3. Finally, use the Logical Volume Manager to put a physical volume on that new mirror-pair, create a volume group, and create logical volumes to use for your filesystems.

Good luck!

Solution 2:

No, there is no fundamental reason why you should create a single, full-disk partition on each member drive as opposed to using it unpartitioned. I use this method all the time, and haven't seen any issues.

The only likely issues are going to documentation/social issues. If the array breaks for some reason and some other admin is trying to recover it, and they assume you partitioned each drive first and can't find the partitions, they might assume the data is totally lost.

Of course, this is somewhat trivially avoided because md's metadata is still there, so if they do a scan with mdadm they should still be able to find it.