Incorrect cache device after ZFS import

I've recently migrated from an Ubuntu machine to an Arch Linux machine.

I imported the pool using the zpool import -f tank and it reported my cache drive as faulted, but my storage drives are working fine. Its a raidz2 with 5 drives. The weird thing is, its reporting the wrong drive as the cache. Its listing sde, when it should be sdg. Notice that sde is also listed as a storage device.

    ❯ zpool status
  pool: tank
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub canceled on Tue Dec 29 21:16:30 2020
config:
 
    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        sde     ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sda     ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
    cache
      sde       FAULTED      0     0     0  corrupted data

My actual cache drive is happily waiting to be used: see /dev/sdg:

~
❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0   1.8T  0 disk
|-sda1        8:1    0   1.8T  0 part
`-sda9        8:9    0     8M  0 part
sdb           8:16   0   1.8T  0 disk
|-sdb1        8:17   0   1.8T  0 part
`-sdb9        8:25   0     8M  0 part
sdc           8:32   0 447.1G  0 disk
|-sdc1        8:33   0   450M  0 part
|-sdc2        8:34   0   100M  0 part
|-sdc3        8:35   0    16M  0 part
|-sdc4        8:36   0 445.7G  0 part
`-sdc5        8:37   0   875M  0 part
sdd           8:48   0   1.8T  0 disk
|-sdd1        8:49   0   1.8T  0 part
`-sdd9        8:57   0     8M  0 part
sde           8:64   0   1.8T  0 disk
|-sde1        8:65   0   1.8T  0 part
`-sde9        8:73   0     8M  0 part
sdf           8:80   0   1.8T  0 disk
|-sdf1        8:81   0   1.8T  0 part
`-sdf9        8:89   0     8M  0 part
sdg           8:96   0 465.8G  0 disk
|-sdg1        8:97   0 465.8G  0 part
`-sdg9        8:105  0     8M  0 part
sr0          11:0    1  1024M  0 rom
nvme0n1     259:0    0   1.8T  0 disk
|-nvme0n1p1 259:1    0   550M  0 part /boot/EFI
`-nvme0n1p2 259:2    0   1.8T  0 part /

I'm not sure how to replace the cache drive with the correct one. The replace command throws an error:

sudo zpool replace tank sde  
/dev/sde is in use and contains a unknown filesystem.

I tried adding the actual cache drive back and got this error:

❯ sudo zpool add tank cache /dev/sdg
cannot add to 'tank': one or more vdevs refer to the same device

zdb output doesn't list the cache device

tank:
    version: 5000
    name: 'tank'
    state: 0
    txg: 3783078
    pool_guid: 3128882764625212484
    errata: 0
    hostname: 'stephen-desktop'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 3128882764625212484
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            guid: 12617640708297166488
            nparity: 2
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 10001923440640
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                guid: 13646832995608515279
                path: '/dev/sde1'
                whole_disk: 1
                DTL: 344
                create_txg: 4
                com.delphix:vdev_zap_leaf: 130
            children[1]:
                type: 'disk'
                id: 1
                guid: 437662985516969209
                path: '/dev/sdb1'
                whole_disk: 1
                DTL: 343
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
            children[2]:
                type: 'disk'
                id: 2
                guid: 12577615618022029516
                path: '/dev/sda1'
                whole_disk: 1
                DTL: 368
                create_txg: 4
                com.delphix:vdev_zap_leaf: 367
            children[3]:
                type: 'disk'
                id: 3
                guid: 14049339035002966003
                path: '/dev/sdd1'
                whole_disk: 1
                DTL: 341
                create_txg: 4
                com.delphix:vdev_zap_leaf: 133
            children[4]:
                type: 'disk'
                id: 4
                guid: 2563007804694134101
                path: '/dev/sdf1'
                whole_disk: 1
                DTL: 340
                create_txg: 4
                com.delphix:vdev_zap_leaf: 134
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

Solution 1:

You could consider importing the pool with /dev/disk/by-id names instead of the standard SCSI sd* names. Due to your OS move and inconsistent device enumeration, /dev/sd* names are not deterministic and can possibly change.

Here's an example of how to do this: https://unix.stackexchange.com/q/288599/3416