zpool replace ran successfully, but still recommends zpool replace. What is it telling me?
A drive failed in a raidz3 (pci-0000:03:00.0-scsi-0:0:10:0
), and I replaced it with an available spare (wwn-0x5000c500858252ef):
pool: darkpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: resilvered 3.16T in 137h44m with 0 errors on Mon Sep 23 16:07:06 2019
config:
NAME STATE READ WRITE CKSUM
darkpool DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
wwn-0x5000c5008581aafb ONLINE 0 0 0
wwn-0x5000c5008581b61b ONLINE 0 0 0
sdm ONLINE 0 0 0
sdj ONLINE 0 0 0
wwn-0x5000c5008581b953 ONLINE 0 0 0
wwn-0x5000c5008581bdf7 ONLINE 0 0 0
wwn-0x5000c50085825ec7 ONLINE 0 0 0
sdg ONLINE 0 0 0
wwn-0x5000c5008581e423 ONLINE 0 0 0
wwn-0x5000c5008581fd3f ONLINE 0 0 0
wwn-0x5000c50085820b93 ONLINE 0 0 0
wwn-0x5000c500858211b3 ONLINE 0 0 0
wwn-0x5000cca267ab0de4 ONLINE 0 0 0
spare-13 DEGRADED 0 0 0
pci-0000:03:00.0-scsi-0:0:10:0 FAULTED 0 69 0 too many errors
wwn-0x5000c500858252ef ONLINE 0 0 0
spares
wwn-0x5000c500858252ef INUSE currently in use
I thought the spare would go away after resilvering, and become a member of the pool. But the DEGRADED
state persisted, and the spare was still INUSE
filling in for the bad drive.
The recommendation was replace the faulted device, or use use 'zpool clear' to mark the device as repaired.
I completely misunderstood what clear
meant. That's my mistake. I think I just made things worse.
pool: darkpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Mon Sep 23 17:12:17 2019
6.18T scanned out of 46.4T at 162M/s, 72h16m to go
0B repaired, 13.31% done
config:
NAME STATE READ WRITE CKSUM
darkpool DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
wwn-0x5000c5008581aafb ONLINE 0 0 0
wwn-0x5000c5008581b61b ONLINE 0 0 0
sdm ONLINE 0 0 0
sdj ONLINE 0 0 0
wwn-0x5000c5008581b953 ONLINE 0 0 0
wwn-0x5000c5008581bdf7 ONLINE 0 0 0
wwn-0x5000c50085825ec7 ONLINE 0 0 0
sdg ONLINE 0 0 0
wwn-0x5000c5008581e423 ONLINE 0 0 0
wwn-0x5000c5008581fd3f ONLINE 0 0 0
wwn-0x5000c50085820b93 ONLINE 0 0 0
wwn-0x5000c500858211b3 ONLINE 0 0 0
wwn-0x5000cca267ab0de4 ONLINE 0 0 0
spare-13 DEGRADED 0 0 0
pci-0000:03:00.0-scsi-0:0:10:0 FAULTED 0 0 0 corrupted data
wwn-0x5000c500858252ef ONLINE 0 0 0
spares
wwn-0x5000c500858252ef INUSE currently in use
errors: No known data errors
I've added another drive in the final open slot, wnn-0x5000cca26788a8f8
, but what should I be replacing here?
Output of lsscsi
:
[0:0:2:0] disk SEAGATE ST8000NM0075 PS24 0x5000c500858211b3 /dev/sda
[0:0:3:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581b953 /dev/sdb
[0:0:4:0] disk SEAGATE ST8000NM0075 PS24 0x5000c50085825ec7 /dev/sdc
[0:0:5:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581e423 /dev/sdd
[0:0:6:0] disk HGST HUH721008AL5205 D384 0x5000cca26788a8f8 /dev/sdq
[0:0:7:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581b61b /dev/sde
[0:0:8:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581aafb /dev/sdf
[0:0:9:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581cc03 /dev/sdg
[0:0:10:0] disk HGST HUH721008AL5205 D384 0x5000cca267ab0de4 /dev/sdh
[0:0:11:0] disk SEAGATE ST8000NM0075 PS24 0x5000c50085823d2b /dev/sdi
[0:0:12:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581b933 /dev/sdj
[0:0:13:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581bdf7 /dev/sdk
[0:0:14:0] disk SEAGATE ST8000NM0075 PS24 0x5000c50085820b93 /dev/sdl
[0:0:15:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581b79f /dev/sdm
[0:0:16:0] disk SEAGATE ST8000NM0075 PS24 0x5000c500858252ef /dev/sdn
[0:0:17:0] disk SEAGATE ST8000NM0075 PS24 0x5000c5008581fd3f /dev/sdo
[0:2:0:0] disk DELL PERC H330 Adp 4.27 0x61866da05f3bc2001f1c1a0d117e72cf /dev/sdp
[10:0:0:0] cd/dvd HL-DT-ST DVD+-RW GHB0N A1C0 0x5001480000000000 /dev/sr0
sudo zfs get version darkpool
NAME PROPERTY VALUE SOURCE
darkpool version 5 -
I believe the pool was created on Ubuntu 14.04
History for 'darkpool':
2016-07-15.10:38:13 [txg:5] create pool version 5000; software version 5000/5; uts hippocampus 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64
No, I don't know who set these names up, I prefer by-id, though it doesn't seem to matter anymore in Ubuntu 18.
Solution 1:
You probably cleared your zpool errors too early (the resilver was in progress) However, you should not have done any harm to pool health.
Let your pool resilver, then issue zpool clear darkpool
and finally do zpool scrub darkpool
. This should be enough to have a clean pool.