zfs: scrub vs resilver (are they equivalent?)

Is a resilver as good as a scrub? If not, why?

Use case: during a scrub checksum errors come up. Instead of continuing with the scrub, stop it, replace drive and resilver. Did the resilver do some/all of the checking that a scrub would have done?


Solution 1:

A scrub reads all the data in the zpool and checks it against its parity information.

A resilver re-copies all the data in one device from the data and parity information in the other devices in the vdev: for a mirror it simply copies the data from the other device in the mirror, from a raidz device it reads data and parity from remaining drives to reconstruct the missing data.

They are not the same, and in my interpretation they are not equivalent. If a resilver encounters an error when trying to reconstruct a copy of the data, this may well be a permanent error (since the data can't be correctly reconstructed any more). Conversely if a scrub detects corruption, it can usually be fixed from the remaining data and parity (and this happens silently at times in normal use as well).

Solution 2:

If you are replacing a drive, it is beneficial to have the old drive still present if it hasn't completely failed as additional redundancy during the resilvering process. If you have no redundancy left, any further errors will result in some data loss in the affected files.

A resilver operation will read the minimum amount of data required to restore redundancy onto the replacement disk. A scrub operation will read ALL data, both primary and parity data.

So if you are resilvering a mirror or raidz1, they are equivalent as resilver has to read all the surviving data. If you are resilvering a 3-way mirror, raidz2 or raidz3, resilver will not read all of the surviving data, so in those cases, scrub and resilver are not equivalent.