Clear a permanent ZFS error in a healthy pool
I scrubbed my pool today, and after the scrub finished, I noticed there was an error that corrupted a file. I didn't care about the file, so I deleted it. Unfortunately, the error remains (now referenced by a hex ID and not a filename), and I don't know how to clear it.
- Should I be worried? Am I not really free of this error just yet?
- Can I clear the error? If the file is gone, I don't really want to see this error in the future.
For reference, here are the commands I issued and the output, with annotations:
Checking status
kevin@atlas:~$ sudo zpool status -v
pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:
NAME STATE READ WRITE CKSUM
zstorage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698 ONLINE 0 0 0
ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289 ONLINE 0 0 0
ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause/03 Harmony One.mp3
Switching to root and deleting the file - I don't need it
kevin@atlas:~$ sudo -i
root@atlas:~# cd /zstorage/owncloud/kevin/files/Archives/Music/Kev\ Rev\ 7/graveyard/Old/Four\ Tet/Pause/
root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# rm 03\ Harmony\ One.mp3
Checking status again
root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool status -v
pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:
NAME STATE READ WRITE CKSUM
zstorage ONLINE 0 0 1
raidz1-0 ONLINE 0 0 2
ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698 ONLINE 0 0 0
ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289 ONLINE 0 0 0
ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
zstorage:<0x9f115>
Uh oh. Maybe I can clear the error?
root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool clear zstorage
root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool status -v
pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:
NAME STATE READ WRITE CKSUM
zstorage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698 ONLINE 0 0 0
ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289 ONLINE 0 0 0
ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
zstorage:<0x9f115>
This doesn't look good!
Scrub your pool again (if you haven't already):
zpool scrub zstorage
That error is telling you that inode <0x9f115> is corrupt (deleting the file broke the filename->inode mapping, so it's just reporting the inode now). Either something still has the file open or the metadata just needs to be cleaned up (which a scrub should do).
To clear the error if a scrub won't you need to get down and dirty with zdb, which is not publicly documented by oracle (and poorly documented elsewhere) - and at any rate probably indicates something more fundamentally wrong.
I know I'm super late to the party, but just wanted to add that if the additional scrubs don't fix issues like this, instead of looking at zdb
you can instead just start a scrub, let it run for a couple minutes, and then stop it with zpool scrub -s zstorage
. That will worked for me at clearing permanent errors for files when when all the read/write/checksum errors were at zero.
http://unixetc.co.uk/2012/01/22/zfs-corruption-persists-in-unlinked-files/
EDIT: After having to do this a few times I also realized that the timing of how long you let the scrub run will affect whether it works (depending on what blocks it does looks at first). So if it doesn't work at first, try a few more times and adjust the timing of when you stop it.