Does speed of ZFS snapshot rollback depend on number of files?

Solution 1:

The number of files and directories involved in a zfs send/recv stream should have no direct impact on its transfer speed. Indirectly, it might, because it is usually true to say that the 'spread' of the dataset across your disks will be higher with more directories/files, depending on the workload that generated them. This matters, because it's far easier for a hard disk to do a sequential read than a random read -- and if the stream in question is all over your disks, it will be much more of a random read workload than sequential.

Further, it is my understanding there is ZFS metadata involved in files on ZFS filesystems (not on zvols); I have no direct numbers, but I would be unsurprised for a single 2.5 TB file to have, on the whole, significantly less metadata blocks associated with it than 2.5 TB full of 15 million files. These (potentially many) extra metadata blocks will add more data that must be read, thus more disk reading (and potentially seeking) going on. So yes, it is likely that indirectly, a send stream consisting of 15 million files may be slower to create than one consisting of a single file of the same size (especially if that one file was created all at once, as a sequential write, on a pool with plenty of contiguous free space at the time).

It is very common for ZFS send/recv streams that are sent out unbuffered to have very spotty performance - at times they seem to go quite quickly, then will drop to nearly nothing for potentially long periods of time. The behavior has been described and even analyzed to some extent in various forums on the internet, so I won't get into it. The general take-away is that while ZFS can and should do some work on making it a more efficient workflow internally, a 'quick fix' for many of the issues is to introduce a buffer on the sending and receiving side. For this, the most common tool is 'mbuffer'.

By piping your zfs send through mbuffer before netcat (and again through mbuffer before zfs recv), you should see a marked improvement if the underlying problem is one that adding a buffer can assist with. Alasdair has a terse write-up on it over on his blog -- I don't have anything published on this topic at the moment, so I'll point you at his: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/