zpool replace ran successfully, but still recommends zpool replace. What is it telling me?

A drive failed in a raidz3 (pci-0000:03:00.0-scsi-0:0:10:0), and I replaced it with an available spare (wwn-0x5000c500858252ef):

  pool: darkpool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 3.16T in 137h44m with 0 errors on Mon Sep 23 16:07:06 2019
config:

    NAME                                  STATE     READ WRITE CKSUM
    darkpool                              DEGRADED     0     0     0
      raidz3-0                            DEGRADED     0     0     0
        wwn-0x5000c5008581aafb            ONLINE       0     0     0
        wwn-0x5000c5008581b61b            ONLINE       0     0     0
        sdm                               ONLINE       0     0     0
        sdj                               ONLINE       0     0     0
        wwn-0x5000c5008581b953            ONLINE       0     0     0
        wwn-0x5000c5008581bdf7            ONLINE       0     0     0
        wwn-0x5000c50085825ec7            ONLINE       0     0     0
        sdg                               ONLINE       0     0     0
        wwn-0x5000c5008581e423            ONLINE       0     0     0
        wwn-0x5000c5008581fd3f            ONLINE       0     0     0
        wwn-0x5000c50085820b93            ONLINE       0     0     0
        wwn-0x5000c500858211b3            ONLINE       0     0     0
        wwn-0x5000cca267ab0de4            ONLINE       0     0     0
        spare-13                          DEGRADED     0     0     0
          pci-0000:03:00.0-scsi-0:0:10:0  FAULTED      0    69     0  too many errors
          wwn-0x5000c500858252ef          ONLINE       0     0     0
    spares
      wwn-0x5000c500858252ef              INUSE     currently in use

I thought the spare would go away after resilvering, and become a member of the pool. But the DEGRADED state persisted, and the spare was still INUSE filling in for the bad drive.

The recommendation was replace the faulted device, or use use 'zpool clear' to mark the device as repaired.

I completely misunderstood what clear meant. That's my mistake. I think I just made things worse.

  pool: darkpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Mon Sep 23 17:12:17 2019
    6.18T scanned out of 46.4T at 162M/s, 72h16m to go
    0B repaired, 13.31% done
config:

    NAME                                  STATE     READ WRITE CKSUM
    darkpool                              DEGRADED     0     0     0
      raidz3-0                            DEGRADED     0     0     0
        wwn-0x5000c5008581aafb            ONLINE       0     0     0
        wwn-0x5000c5008581b61b            ONLINE       0     0     0
        sdm                               ONLINE       0     0     0
        sdj                               ONLINE       0     0     0
        wwn-0x5000c5008581b953            ONLINE       0     0     0
        wwn-0x5000c5008581bdf7            ONLINE       0     0     0
        wwn-0x5000c50085825ec7            ONLINE       0     0     0
        sdg                               ONLINE       0     0     0
        wwn-0x5000c5008581e423            ONLINE       0     0     0
        wwn-0x5000c5008581fd3f            ONLINE       0     0     0
        wwn-0x5000c50085820b93            ONLINE       0     0     0
        wwn-0x5000c500858211b3            ONLINE       0     0     0
        wwn-0x5000cca267ab0de4            ONLINE       0     0     0
        spare-13                          DEGRADED     0     0     0
          pci-0000:03:00.0-scsi-0:0:10:0  FAULTED      0     0     0  corrupted data
          wwn-0x5000c500858252ef          ONLINE       0     0     0
    spares
      wwn-0x5000c500858252ef              INUSE     currently in use

errors: No known data errors

I've added another drive in the final open slot, wnn-0x5000cca26788a8f8, but what should I be replacing here?


Output of lsscsi:

[0:0:2:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c500858211b3                  /dev/sda 
[0:0:3:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581b953                  /dev/sdb 
[0:0:4:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c50085825ec7                  /dev/sdc 
[0:0:5:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581e423                  /dev/sdd 
[0:0:6:0]    disk    HGST     HUH721008AL5205  D384  0x5000cca26788a8f8                  /dev/sdq 
[0:0:7:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581b61b                  /dev/sde 
[0:0:8:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581aafb                  /dev/sdf 
[0:0:9:0]    disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581cc03                  /dev/sdg 
[0:0:10:0]   disk    HGST     HUH721008AL5205  D384  0x5000cca267ab0de4                  /dev/sdh 
[0:0:11:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c50085823d2b                  /dev/sdi 
[0:0:12:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581b933                  /dev/sdj 
[0:0:13:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581bdf7                  /dev/sdk 
[0:0:14:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c50085820b93                  /dev/sdl 
[0:0:15:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581b79f                  /dev/sdm 
[0:0:16:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c500858252ef                  /dev/sdn 
[0:0:17:0]   disk    SEAGATE  ST8000NM0075     PS24  0x5000c5008581fd3f                  /dev/sdo 
[0:2:0:0]    disk    DELL     PERC H330 Adp    4.27  0x61866da05f3bc2001f1c1a0d117e72cf  /dev/sdp 
[10:0:0:0]   cd/dvd  HL-DT-ST DVD+-RW GHB0N    A1C0  0x5001480000000000                  /dev/sr0 

sudo zfs get version darkpool
NAME      PROPERTY  VALUE    SOURCE
darkpool  version   5        -

I believe the pool was created on Ubuntu 14.04

History for 'darkpool':
2016-07-15.10:38:13 [txg:5] create pool version 5000; software version 5000/5; uts hippocampus 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64

No, I don't know who set these names up, I prefer by-id, though it doesn't seem to matter anymore in Ubuntu 18.


Solution 1:

You probably cleared your zpool errors too early (the resilver was in progress) However, you should not have done any harm to pool health.

Let your pool resilver, then issue zpool clear darkpool and finally do zpool scrub darkpool. This should be enough to have a clean pool.