How to clear ZFS DEGRADED status in repaired pool

I got my first drive failure after a couple of years of maintaining this zpool so I did a zpool replace the drive with one of my spares. It took 60 hours (as shown below) to resilver the array but it seems to have done it with zero errors.

The problem is that it is still showing DEGRADED status. The output is:

# zpool status
  pool: sbn
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: resilvered 1.07T in 60h9m with 0 errors on Fri Aug  7 01:15:41 2020
config:

    NAME                                   STATE     READ WRITE CKSUM
    sbn                                    DEGRADED     0     0     0
      raidz2-0                             DEGRADED     0     0     0
        ata-ST4000DM005-2DP166_ZDH1TP9H    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TM7G    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TLHP    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TL8F    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TNT8    ONLINE       0     0     0
        spare-5                            UNAVAIL      0     0     0
          15983766503331633058             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST4000DM005-2DP166_ZDH1TNCF-part1
          ata-ST4000DM005-2DP166_ZDH1TW8L  ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TW63    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TM4R    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TLSG    ONLINE       0     0     0
        ata-ST4000DM005-2DP166_ZDH1TMAM    ONLINE       0     0     0
    spares
      ata-ST4000DM005-2DP166_ZDH1TW8L      INUSE     currently in use
      ata-ST4000DM005-2DP166_ZDH1TM17      AVAIL   

errors: No known data errors

I can't find any documentation that explains the spare-5 structure, which showed up after I did the replace. The dead drive shows up as 15983766503331633058 and it remembers the original failed disk id as ata-ST4000DM005-2DP166_ZDH1TNCF.

How do I clean this up so it is running with 10 clean drives again with one available spare?


Solution 1:

You would need to run the following command:

zpool clear sbn

This will clear all errors associated with the virtual devices in the pool, and clear any data error counts associated with the pool.

Source: https://docs.oracle.com/cd/E36784_01/html/E36835/gbbvf.html

Solution 2:

After some time I found the answer, which turns out that the failed drive needs to be detached from the pool. In this specific case I did:

zpool detach sbn ata-ST4000DM005-2DP166_ZDH1TNCF

Note that the drive ID is taken from the "was" statement in the zpool status above. Once this is done the zpool status is clean and is marked with state: ONLINE.

Hopefully this helps someone in a similar situation.