ZFS Recover from Faulted Pool State

I have a six disk ZFS raidz1 pool and had a recent failure requiring a disk replacement. No problem normally, but this time my server hardware died before I could do the replacement (but after and unrelated to the drive failure as far as I can tell).

I was able to get another machine from a friend to rebuild the system, but in the process of moving my drives over I had to swap their cables around a bunch until I got the right configuration where the remaining 5 good disks were seen as online. This process seems to have generated some checksum errors for the pool/raidz.

I have the 5 remaining drives set up now and a good drive installed and ready to take the place of the drive that died. However, since my pool state is FAULTED I'm unable to do the replacement.

root@zfs:~# zpool replace tank 1298243857915644462 /dev/sdb
cannot open 'tank': pool is unavailable

Is there any way to recover from this error? I would think that having 5 of the 6 drives online would be enough to rebuild the right data, but that doesn't seem to be enough now.

Here's the status log of my pool:

root@zfs:~# zpool status tank
  pool: tank
 state: FAULTED
status: One or more devices could not be used because the label is missing or invalid.
        There are insufficient replicas for the pool to continue functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: none requested
config:

    NAME                     STATE     READ WRITE CKSUM
    tank                     FAULTED      0     0     1  corrupted data
      raidz1-0               ONLINE       0     0     8
        sdd                  ONLINE       0     0     0
        sdf                  ONLINE       0     0     0
        sdh                  ONLINE       0     0     0
        1298243857915644462  UNAVAIL      0     0     0  was /dev/sdb1
        sde                  ONLINE       0     0     0
        sdg                  ONLINE       0     0     0

Update (10/31): I tried to export and re-import the array a few times over the past week and wasn't successful. First I tried:

zpool import -f -R /tank -N -o readonly=on -F tank

That produced this error immediately:

cannot import 'tank': I/O error
       Destroy and re-create the pool from a backup source.

I added the '-X' option to the above command to try to make it check the transaction log. I let that run for about 48 hours before giving up because it had completely locked up my machine (I was unable to log in locally or via the network).

Now I'm trying a simple zpool import tank command and that seems to run for a while with no output. I'll leave it running overnight to see if it outputs anything.

Update (11/1): zpool import tank has been running for about 12 hours now with no command line output so far. However, my computer is still responsive so that's a plus.


Basicly there is no official way to recover other than restore from backup. But there is ZFS feature called rewind, that may be possible to remove transactions from the pool to a point that the pool is functional again. The following text is from ZFS Internals blog part #11

DO NOT TRY IT IN PRODUCTION. USE AT YOUR OWN RISK!

zpool import -FX mypool where options mean:
* -F Attempt rewind if necessary.
* -X Turn on extreme rewind.
* -T Specify a starting txg to use for import. This option is intentionally undocumented option for testing purposes.

First I tried to recover using this rewind procedure. It didn't work for me, maybe it is not implemented on zfs-fuse for Linux. According to ZFSOnDiskFormat.pdf, there is array with 128 possible values for txg. In my zfs-fuse version 0.7.0 option -T don't exist. So I modified zfs-fuse to list available txg in uberblock array and to allow starting from txg with a specific Id. Using modified zfs-fuse I was able to access filesystems in ZFS.

I did recover my pool by using this method. So it is possible to recover, but it is unsupported method and has to be done very careful, as it is pretty easy to mess things even worse. My opinion is Sun/Oracle should provide fsck for ZFS for these situations.