Zpool degrades when plugging in a drive

In an effort to test what impact adding a ZFS log device would have to a ZFS array, I decided to create a zpool and perform some benchmarks before then plugging in an SSD to act as the ZIL.

Unfortunately, whenever I plug in the SSD after having created the zpool, or unplug the SSD after having created the pool (anything that causes drive letters to change after the pool has been created), and then reboot, my pool will become degraded as can be shown by running sudo zpool status

  pool: zpool1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME                     STATE     READ WRITE CKSUM
    zpool1                   DEGRADED     0     0     0
      mirror-0               DEGRADED     0     0     0
        sda                  ONLINE       0     0     0
        1875547483567261808  UNAVAIL      0     0     0  was /dev/sdc1

I suspect the problem stems from the fact that I created the pool using the drive letters like so:

sudo zpool create -f zpool1 mirror /dev/sdb /dev/sdc

Questions

Luckily for me, this is just a test and no risk of losing data, but should this happen in a real-world scenario, what is the best way to recover from this issue? Obviously the drive still exists and is ready to go.

Is there a better way to create the zpool without using the drive letters like /dev/sda to avoid this problem in future? I notice that the Ubuntu documentation states to create a zpool in the same manner that I did.

Extra Info

  • OS: Ubuntu 16.04 Server 4.10
  • Installed zfs from installing zfsutils-linux package

Solution 1:

After getting help from Dexter_Kane on the Level 1 techs forum, the answer is to use /dev/disk/by-id/... paths instead when creating the pools.

E.g.

sudo zpool create zpool1 mirror \
/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0PKS6S7 \
/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7VXZF6H

Converting and Fixing Existing Pools

The good news is that you can "convert" an existing ZFS RAID array to using these labels which prevents this happening in future, and will even resolve your degraded array if this situation has already happened to you.

sudo zpool export [pool name]
sudo zpool import -d /dev/disk/by-id [pool name]

You just have to make sure the pools datasets are not in use. E.g don't execute the commands whilst inside the pool, and ensure they aren't being shared via NFS etc.

After performing the conversion, output of sudo zpool status should be similar to:

  pool: zpool1
 state: ONLINE
  scan: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        zpool1                                        ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0PKS6S7  ONLINE       0     0     0
            ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7VXZF6H  ONLINE       0     0     0

Testing Performed

I made sure to test that:

  • Using by-id paths did prevent the issue happening.
  • After writing some data whilst the pool was in a degraded state, I could still read all the files after performing the export/import, and sudo zpool status reported no errors.