Identifying the number of changed bytes between two ZFS snapshots of the same file
Let's assume I have a ZFS filesystem that contains virtual machine disk images, e.g.
/tank/examplevm/examplevm-flat.vmdk
Let's further assume I take daily snapshots of that ZFS filesystem, e.g.
$ zfs snapshot tank@20120716
$ zfs snapshot tank@20120717
Obviously, in the period between each daily snapshot, changes are made to my examplevm-flat.vmdk (in most cases, the size of the image stays constant, but blocks in the virtual disk are modified).
Accordingly, the ZFS diff command will now report the file as modified between the two snapshots:
$ zfs diff tank/@20120716 tank@20120717
M /tank/examplevm/examplevm-flat.vmdk
While it is good to know that the file has been modified, I would be much more interested in the number of bytes/blocks that have been modified in the vmdk.
Therefore, I'd be interested in any hints on the following questions:
- Does ZFS have any feature to report the number of changed blocks in a specific file between two snapshots?
- Is there any other tool that will binary diff two file system images and report the number of changed blocks or bytes? I realize that
cmp –l file1 file2 | wc –l
does that, but it is horribly, horribly slow.
ZFS has no feature to report that however, the undocumented zdb tool can certainly be used to get the blocks used by a file in a particular dataset (filesystem or snapshot) so achieving what you look for with a little bit of scripting is doable, although it would probably take a very long time to process zdb output.
Here is a blog showing how to use zdb to extract a file's blocks.