zpool fails to import raidz3 pool despite sufficient replicas available
Some users were having issues connecting to this server's share on the pool, while other's who were already on seemed to be fine. After arranging a reboot the pool failed to import once the system booted.
During the reboot I noticed a drive faulted during POST, indicated by an orange light on the bezel, and below in zpool import
.
The pool has enough devices to be brought online, but it won't successfully import.
$ zpool import
pool: darkpool
id: 5743344949875332602
state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
config:
darkpool DEGRADED
raidz3-0 DEGRADED
wwn-0x5000c5008581aafb ONLINE
wwn-0x5000c5008581b61b ONLINE
wwn-0x5000c5008581b79f ONLINE
wwn-0x5000c5008581b933 ONLINE
wwn-0x5000c5008581b953 ONLINE
wwn-0x5000c5008581bdf7 ONLINE
wwn-0x5000c50085825ec7 ONLINE
wwn-0x5000c5008581cc03 ONLINE
wwn-0x5000c5008581e423 UNAVAIL
wwn-0x5000c5008581fd3f ONLINE
wwn-0x5000c50085820b93 ONLINE
wwn-0x5000c500858211b3 ONLINE
wwn-0x5000cca267ab0de4 ONLINE
spare-13 DEGRADED
11992420879588183985 FAULTED corrupted data
wwn-0x5000c500858252ef ONLINE
spares
wwn-0x5000c500858252ef
$ zpool status
no pools available
$ zpool import darkpool
cannot import 'darkpool': I/O error
Destroy and re-create the pool from
a backup source.
$ zpool import -f darkpool
cannot import 'darkpool': I/O error
Destroy and re-create the pool from
a backup source.
$ zpool import -fFn darkpool
$ zpool import -F darkpool
cannot import 'darkpool': I/O error
Destroy and re-create the pool from
a backup source.
$ zpool import -fFX darkpool
cannot import 'darkpool': I/O error
Destroy and re-create the pool from
a backup source.
Has anyone seen something like this before? I'm not sure what to try before destroying the pool and restoring from a backup (I'd like to avoid this since it will take so long).
It looks like the backups started to fail a couple of weeks ago. Is there any way to know if having the faulted drive serviced would make the pool happy?
The system is Ubuntu 18.04.2 LTS with zfsutils-linux_0.7.5-1ubuntu16.7_amd64.
Solution 1:
I wound up signing up for LinkedIn Premium so I could message a ZFS developer (who was actually kind enough to respond!). He suggested I move the pool to a system with ZFS 0.8, a version which his relevant commits on Github were included in Ububtu 19.10, among others distros.
In read-only mode, we were able to load the pool by disabling the option spa_load_verify_metadata
. This also skips the scan of the pool so you don't have to wait minutes or hours depending on the size of your pool.
Once the pool was loaded I started a backup of everything to a different server, with plans to destroy the pool and server (too many on-site trips from Dell, replacing CPUs, memory, the mobo, etc...), and start fresh with a new system.
Toggling the Option (Ubuntu 19.10):
$ cat /sys/module/zfs/parameters/spa_load_verify_metadata
1
$ echo 0 >/sys/module/zfs/parameters/spa_load_verify_metadata
$ cat /sys/module/zfs/parameters/spa_load_verify_metadata
0
Loading the Pool
zpool import -o readonly=on darkpool -f
The flag will reset after a reboot, so the pool won't load during the boot process. But really you want to copy the data and stop using the pool anyway.