How to 're-balance' data in zfs? (Make sure the data is spread amongst all striped mirrors)
Using a striped mirror (Raid 10) as an example, if two disks are expanded to four, how can the data from the two disks be 're-balanced' (spread out) amongst the two mirrors?
That is, how can the files which were written to one mirror be written to two mirrors to take advantage of the striped disks?
Solution 1:
Only newly written (appended) data is split between all currently active vdevs, there is no explicit rebalance operation in ZFS.
Two possible solutions:
- Wait until all old data is written again (because of CoW, this can take a very long time, in the worst case double the time it would take to write the disk completely).
- Remove all data and write it anew (zfs send/recv is helpful in getting all data off the pool and back without losing anything). This does not have to be done all in one go, and it can be done on the same pool.
To be more precise, I would choose the second solution and transfer each file system separately at times where system load is low (for example at night):
- Take a snapshot (
zfs snapshot -r
) of a decently sized file system (and decendant file systems, recursively) - Use
zfs send | zfs recv
with appropriate options to send the snapshot to a newly created temporary file system (can be on the same pool if space permits); this file system should be on the same location in the hierarchy as the old one - After the copy is done (may take some time, because the disks have to read and write),
zfs destroy
the old snapshot and old file system -
zfs rename
the temporary system to the old name - Check and change mount points with
zfs mount
, rearranging the previous situation for your replaced file system - Repeat until all file systems are moved
Solution 2:
possible 3rd solution (as mentioned by SirMaster in this FreeNAS forum post) :
- add new disk(s) to zpool
- copy many files to another new /mnt/pool/temp/ directory
- delete original files:
rm -rf original/
- rename dir back:
mv temp/ original/
This works because ZFS will proportionally place writes to whichever vdev has most free space, in this case the brand new drives which were empty. (as of 0.7, zfs will favor faster drives for writes, but lets assume your 2 new drives are same or greater performance than the original drives)
It's probably slower than zfs send | zfs recv
, but simpler because you don't have to create/destroy snapshots.
You can run zpool list -v
before and after to see each vdev utilization.
Also, found a php script that does the copy/delete/rename procedure on a file-by-file basis. This was linked in an answer from a similar (but zvol) question a few years older. (didn't test this script personally)