ZFS vdevs accumulate checksum errors, but individual disks do not
I'm running a vendor-specific derivative of FreeNAS 9.3.
My trouble started when I installed a new JBOD chassis to add two new vdevs into my pool, and the chassis had a bad board. During this time, I was seeing SAS power errors on the drives on the bad board--my new drives were effectively turning on and off again, repeatedly, every minute.
I replaced the board and now, by most measures, the drives are functioning well, but ZFS is still giving me extremely strange checksum errors when I view zpool status
. I think there were some bad CoW writes when I was having the SAS power issues.
The first chassis with the CPU, boot drive, RAM, etc., connects to the first expansion JBOD chassis via mini-SAS, and the second JBOD expansion chassis is daisy-chained through the first JBOD expansion chassis, also via mini-SAS.
- [Chassis 1: boot drive, two L2ARC SSDs, 11/11 drives of RAIDZ3-0, 1/11 drives RAIDZ3-1] -->mini-SAS to Chassis 2
- [Chassis 2: 10/11 drives of RAID Z3-1, 6/11 drives of RAID Z3-2] -->mini-SAS to Chassis 3
- [Chassis 3: 5/11 drives of RAIDZ3-2, 11/11 drives of RAIDZ3-3]
The checksum errors don't neatly map to any one controller or chassis, but my hunch is that when I was having these power issues, whatever data was being written to the different new disks was being written badly across the two new vdevs.
My HBAs are on good LSI firmware--all are on 20.00.04.00 or 20.00.08.00
I've swapped mini-SAS cables, and tried using different ports, to no avail.
The output of zpool status
is showing checksum errors accumulating on the two new vdevs, and after either a scrub, reboot, or zpool clear
, eventually zpool status
marks those vdevs as degraded. What's strange is that it also marks some of the drives that belong to those vdevs as degraded, but their actual error count of the individual disks are all 0. zdb
shows that the individual drives are marked degraded because they have too many checksum errors, even though all their checksum error counts are actually 0. What's also strange is that the pool-level checksum errors show a lower number than the checksum errors from the two problem vdevs added together.
zpool status -v
persistently shows a permanent error in a snapshot mapped to a 0x0
inode that has long been deleted, but can't seem to be cleared by multiple scrubs, reboots, or zpool clear
. Also, other permanent errors float in and out, sometimes only showing as hexcode inodes, and other times as part of recent snapshots. I can't find any 0x0
with lsof
.
I believe that there might be some kind of data corruption with the metadata in the pool.
I'm looking for a way to surgically remove these phantom snapshots or otherwise return my pool to a healthy state without destroying my data. I suspect that somewhere, ZFS is iterating over these corrupt phantom snapshots and causing both the bizarre checksum errors and the degraded states on the vdevs.
I have "cold" LTO backups of much of my important data, but otherwise, if I can't repair my pool, I'm preparing to set up a second server, offload everything to the "hot" second server, destroy my pool at the top level, and then reload from the hot backup.
Here's the output of zpool status -v
:
[root@Jupiter] ~# zpool status -v
pool: freenas-boot
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool.
scan: resilvered 944M in 0h17m with 0 errors on Tue Aug 9 11:56:28 2016
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da46p2 ONLINE 0 0 0 block size: 8192B configured, 8388608B native
da47p2 ONLINE 0 0 0 block size: 8192B configured, 8388608B native
errors: No known data errors
pool: pool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Fri Sep 9 22:43:51 2016
6.27T scanned out of 145T at 1.11G/s, 35h27m to go
0 repaired, 4.33% done
config:
NAME STATE READ WRITE CKSUM
pool DEGRADED 0 0 118
raidz3-0 ONLINE 0 0 0
gptid/ac108605-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ac591d4e-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ac92fd0d-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/accd3076-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ad067e97-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ad46cbee-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ad91ba17-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/adcbdd0a-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ae07dc0d-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ae494d10-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/ae93a3a5-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
raidz3-1 ONLINE 0 0 0
gptid/12f6a4c5-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/511ea1f9-1932-11e6-9b1e-0cc47a599098 ONLINE 0 0 0
gptid/14436fcf-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/14f50aa3-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/159b5654-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/163d682b-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/16ee624e-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/1799dde3-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/184c2ea4-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/18f51c30-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
gptid/19a861ea-c929-11e5-8075-0cc47a599098 ONLINE 0 0 0
raidz3-2 DEGRADED 0 0 236
gptid/5f80fc42-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/60369e0f-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/60e8234a-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/61a235f2-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/62580471-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/6316a38a-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/63d4bce8-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/ebfc2b99-6893-11e6-9b09-0cc47a599098 ONLINE 0 0 0
gptid/654f143a-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/66236b33-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/66eda3f6-4e00-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
raidz3-3 DEGRADED 0 0 176
gptid/c77a9da9-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/c83e100e-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/c8fd9ced-4e02-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/c9bb21ba-4e02-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/ca7a48db-4e02-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/cb422329-4e02-11e6-b7cf-0cc47a599098 DEGRADED 0 0 0 too many errors
gptid/cbfe4c21-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/ccc43528-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/cd93a34c-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/ce622f51-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
gptid/cf2591d3-4e02-11e6-b7cf-0cc47a599098 ONLINE 0 0 0
cache
gptid/aedd3872-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
gptid/af559c10-265c-11e5-9a02-0cc47a599098 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<0x357>:<0x2aef3>
<0x37b>:<0x397285>
pool/[email protected]:<0x0>
Via the FreeNAS GUI, I tried copying the System dataset pool
from pool
over to freenas-boot
and then tried using zfs destroy
to delete the pool
copy of pool/.system
and leaving the freenas-boot
copy intact. I was able to use zfs destroy
to delete everything within pool/.system
listed in zfs list
, but upon trying to destroy pool/.system
with zfs destroy
, the shell returned the error: Cannot iterate filesystems: I/O error
. I tried zfs destroy
on pool/.system
with the the -f
, -r
, and -R
flags, as per the Oracle ZFS documentation, to no avail.
I started yet another scrub. Perhaps eliminating the contents of pool/.system
on the pool
copy of the System dataset pool
will allow the scrub to clear out the metadata error with the phantom snapshot pool/[email protected]
.
I'm wondering if it's possible to resilver each disk that's showing up as degraded, one-by-one, so that the "bad" metadata that's not being reference can be abandoned. I've resilvered two disks, but now I'm running into an issue wherein resilvering any additional disk causes the other disks I've already resilvered to begin resilvering again at the same time. I believe it might be a ZFS bug related to periodic snapshot tasks, and I've gone ahead and deleted my periodic snapshot task and destroyed all my snapshots, but I'm hesitant to try to resilver yet another one of the degraded drives for fear that all the previously resilvered disks will resilver again, leaving me without any redundancy, eventually to the point of having a faulted pool.
After disabling my periodic snapshot tasks and deleting all my snapshots, I tried wiping one disk and then resilvering it, but the three disks that I'd already resilvered started resilvering again. Now I'm almost certain that I'd have two different disks per each problem RAID-Z3 vdev that would resilver, so if I attempt to resilver any more disks, I'll lose the redundancy in each of the problem vdevs and my pool will fault.
One other bizarre behavior is that checking zpool status -v
actually increases the pool's checksum error count incrementally, but checking zpool status
does not. It's almost as if the -v
flag itself is iterating over whatever mechanism is causing checksum errors.
Would using zdb -c
on my pool somehow be able to "fix" these metadata errors?
The 0x0
and other hexadecimal numbers appear instead of filenames and other objects when metadata is corrupted. If you cannot get rid of it by destroying the objects that are affected (I understood they refer to snapshots) then the damage is probably too big to be repaired. I would restore the pool from backup in this case, especially when you have further weird effects like broken metadata appearing and disappearing.
You can read about the methods how to get rid of most problems in the ZFS admin guide here. But ZFS also gives you a URL where to look for solutions when you type zpool status
.