ZFS: How do you restore the correct number of copies after losing a drive?

zfs

"copies=2" (or 3) is more designed to be used with pools with no redundancy (single disk or stripes). The goal is to be able to recover minor disk corruption, not a whole device failure. In the latter case, the pool is unmountable so no ditto blocks restoration can occur.

If you have redundancy (mirroring/raidz/raidz2/raidz3), the ditto blocks are not different than other ones and scrubbing/resilvering will recreate them.

I found this question really intriguing, and after spending an hour pouring over documentation, I dived into the code. Here's what I found.

First, some terminology. Ditto blocks (which are what these copies are, as opposed to mirrors) are automatically created on a write but may or may not be in the same virtual device (vdev) as the original copy. On the other hand, mirrored blocks are always reflected onto another virtual device.

However, the code refers to both types of blocks as children. You'll see here that ditto blocks are just children with io_vd == NULL (this is in the write function). For a mirrored block, io_vd would be set to the corresponding virtual device (your second disk, for example).

With that in mind, when it gets to the read portion, it treats all children (be they mirror or ditto blocks) as potentially unsafe if it doesn't contain the expected good_copies, and rewrites them as needed. So it sounds like the answer to your question is--yes, it will rewrite them when you have at least one good copy, and either of the following:

Unexpected errors when you tried to read the data,
You are resilvering, or
You are scrubbing.

Phew! Maybe someone can point out flaws, but I enjoyed learning about ZFS through this little exercise, and I hope this helps!

@jlliagre and others who seem to think that the entire zpool dies if it one of the disks (vdevs) dies but the pool is not redundant (mirror/raidz). This is not true; a multi-disk pool will always survive a single complete disk failure even if it is not a mirror or raidz.

ZFS Metadata is always copied at least 2 times so total failure of a complete disk (or any portion of it) will not take down the file system. Furthermore, many files, especially smaller ones, will not be spread across all disks and will therefore not necessarily be faulted by the disk failure. The OP is asking about the case of a multi-disk pool using ditto blocks (user data copies > 1). Here, a single complete disk failure ~~should never result in any data loss.~~ ZFS will always try to put ditto blocks far away from the original block, and for pools with multiple vdevs, this always means on another vdev (an exception might be where one vdev is >50% of the pool, which would be very unusual). File system meta data is also always copied +1 or +2 times more than the ditto level, so it will always survive disk failure. Furthermore, if you have a pool more than three disks, you should be able to lose up to half of them without any data loss; ZFS stores the ditto blocks on the next disk over so as long as you never lose two adjacent disks, you never have data loss. (three adjecent disk failure for ditto=2).

When there are sufficient copies of data to access a file (whether those copies are from ditto blocks, mirror, or raidz), then all missing copies of data are repaired when the file is accessed. This is the purpose of the scrub; read all data and fix any that is bad by making use of redundant copies. So to answer the OP question directly, you just need to do a scrub after replacing the failed drive, and all copies will be restored.

As always, you can easily experiment with the concepts by creating pools whose vdevs for backing store are just ordinary sparse files. By deleting or corrupting the vdev files you can simulate any type of failure, and can verify integrity of the pool, file systems, and data along the way.

EDIT: after experimenting, it looks like zfs will fail the pool if a disk fails in a multi-disk non-redundant pool with copies>=2. Parital data corruption on one or more disks should remain survivable and should be fixed by a scrub.

ZFS: How do you restore the correct number of copies after losing a drive?

Related

Recent Posts