strange ZFS disk space usage report for a ZVOL
We have a 100G ZVOL on a FreeBSD 10.0-CURRENT host which claims to use 176G of disk space:
root@storage01:~ # zfs get all zroot/DATA/vtest
NAME PROPERTY VALUE SOURCE
zroot/DATA/vtest type volume -
zroot/DATA/vtest creation Fri May 24 20:44 2013 -
zroot/DATA/vtest used 176G -
zroot/DATA/vtest available 10.4T -
zroot/DATA/vtest referenced 176G -
zroot/DATA/vtest compressratio 1.00x -
zroot/DATA/vtest reservation none default
zroot/DATA/vtest volsize 100G local
zroot/DATA/vtest volblocksize 8K -
zroot/DATA/vtest checksum fletcher4 inherited from zroot
zroot/DATA/vtest compression off default
zroot/DATA/vtest readonly off default
zroot/DATA/vtest copies 1 default
zroot/DATA/vtest refreservation none local
zroot/DATA/vtest primarycache all default
zroot/DATA/vtest secondarycache all default
zroot/DATA/vtest usedbysnapshots 0 -
zroot/DATA/vtest usedbydataset 176G -
zroot/DATA/vtest usedbychildren 0 -
zroot/DATA/vtest usedbyrefreservation 0 -
zroot/DATA/vtest logbias latency default
zroot/DATA/vtest dedup off default
zroot/DATA/vtest mlslabel -
zroot/DATA/vtest sync standard default
zroot/DATA/vtest refcompressratio 1.00x -
zroot/DATA/vtest written 176G -
zroot/DATA/vtest logicalused 87.2G -
zroot/DATA/vtest logicalreferenced 87.2G -
root@storage01:~ #
This looks like a bug, how can it consume more than its volsize
if it does not have snapshots, reservations and children? Or maybe we are missing something?
Upd:
Results of zpool status -v
:
root@storage01:~ # zpool status -v
pool: zroot
state: ONLINE
scan: scrub repaired 0 in 0h6m with 0 errors on Thu May 30 05:45:11 2013
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gpt/disk0 ONLINE 0 0 0
gpt/disk1 ONLINE 0 0 0
gpt/disk2 ONLINE 0 0 0
gpt/disk3 ONLINE 0 0 0
gpt/disk4 ONLINE 0 0 0
gpt/disk5 ONLINE 0 0 0
cache
ada0s2 ONLINE 0 0 0
errors: No known data errors
root@storage01:~ #
Results of zpool list
:
root@storage01:~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 16.2T 288G 16.0T 1% 1.05x ONLINE -
root@storage01:~ #
Results of zfs list
:
root@storage01:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 237G 10.4T 288K /
zroot/DATA 227G 10.4T 352K /DATA
zroot/DATA/NFS 288K 10.4T 288K /DATA/NFS
zroot/DATA/hv 10.3G 10.4T 288K /DATA/hv
zroot/DATA/hv/hv001 10.3G 10.4T 144K -
zroot/DATA/test 288K 10.4T 288K /DATA/test
zroot/DATA/vimage 41.3G 10.4T 288K /DATA/vimage
zroot/DATA/vimage/vimage_001 41.3G 10.5T 6.47G -
zroot/DATA/vtest 176G 10.4T 176G -
zroot/SYS 9.78G 10.4T 288K /SYS
zroot/SYS/ROOT 854M 10.4T 854M /
zroot/SYS/home 3.67G 10.4T 3.67G /home
zroot/SYS/tmp 352K 10.4T 352K /tmp
zroot/SYS/usr 4.78G 10.4T 427M /usr
zroot/SYS/usr/local 288K 10.4T 288K /usr/local
zroot/SYS/usr/obj 3.50G 10.4T 3.50G /usr/obj
zroot/SYS/usr/ports 895K 10.4T 320K /usr/ports
zroot/SYS/usr/ports/distfiles 288K 10.4T 288K /usr/ports/distfiles
zroot/SYS/usr/ports/packages 288K 10.4T 288K /usr/ports/packages
zroot/SYS/usr/src 887M 10.4T 887M /usr/src
zroot/SYS/var 511M 10.4T 1.78M /var
zroot/SYS/var/crash 505M 10.4T 505M /var/crash
zroot/SYS/var/db 1.71M 10.4T 1.43M /var/db
zroot/SYS/var/db/pkg 288K 10.4T 288K /var/db/pkg
zroot/SYS/var/empty 288K 10.4T 288K /var/empty
zroot/SYS/var/log 647K 10.4T 647K /var/log
zroot/SYS/var/mail 296K 10.4T 296K /var/mail
zroot/SYS/var/run 448K 10.4T 448K /var/run
zroot/SYS/var/tmp 304K 10.4T 304K /var/tmp
root@storage01:~ #
Upd 2:
We created a number of ZVOLs with different parameters and used dd
to move the content. We noticed another odd thing, disk usage was normal for ZVOLs with 16k and 128k volblocksize
and it remained abnormal for a ZVOL with 8k volblocksize
even after dd
(so this is not a fragmentation issue):
root@storage01:~ # zfs get all zroot/DATA/vtest-3
NAME PROPERTY VALUE SOURCE
zroot/DATA/vtest-3 type volume -
zroot/DATA/vtest-3 creation Fri May 31 7:35 2013 -
zroot/DATA/vtest-3 used 201G -
zroot/DATA/vtest-3 available 10.2T -
zroot/DATA/vtest-3 referenced 201G -
zroot/DATA/vtest-3 compressratio 1.00x -
zroot/DATA/vtest-3 reservation none default
zroot/DATA/vtest-3 volsize 100G local
zroot/DATA/vtest-3 volblocksize 8K -
zroot/DATA/vtest-3 checksum fletcher4 inherited from zroot
zroot/DATA/vtest-3 compression off default
zroot/DATA/vtest-3 readonly off default
zroot/DATA/vtest-3 copies 1 default
zroot/DATA/vtest-3 refreservation 103G local
zroot/DATA/vtest-3 primarycache all default
zroot/DATA/vtest-3 secondarycache all default
zroot/DATA/vtest-3 usedbysnapshots 0 -
zroot/DATA/vtest-3 usedbydataset 201G -
zroot/DATA/vtest-3 usedbychildren 0 -
zroot/DATA/vtest-3 usedbyrefreservation 0 -
zroot/DATA/vtest-3 logbias latency default
zroot/DATA/vtest-3 dedup off default
zroot/DATA/vtest-3 mlslabel -
zroot/DATA/vtest-3 sync standard default
zroot/DATA/vtest-3 refcompressratio 1.00x -
zroot/DATA/vtest-3 written 201G -
zroot/DATA/vtest-3 logicalused 100G -
zroot/DATA/vtest-3 logicalreferenced 100G -
root@storage01:~ #
and
root@storage01:~ # zfs get all zroot/DATA/vtest-16
NAME PROPERTY VALUE SOURCE
zroot/DATA/vtest-16 type volume -
zroot/DATA/vtest-16 creation Fri May 31 8:03 2013 -
zroot/DATA/vtest-16 used 102G -
zroot/DATA/vtest-16 available 10.2T -
zroot/DATA/vtest-16 referenced 101G -
zroot/DATA/vtest-16 compressratio 1.00x -
zroot/DATA/vtest-16 reservation none default
zroot/DATA/vtest-16 volsize 100G local
zroot/DATA/vtest-16 volblocksize 16K -
zroot/DATA/vtest-16 checksum fletcher4 inherited from zroot
zroot/DATA/vtest-16 compression off default
zroot/DATA/vtest-16 readonly off default
zroot/DATA/vtest-16 copies 1 default
zroot/DATA/vtest-16 refreservation 102G local
zroot/DATA/vtest-16 primarycache all default
zroot/DATA/vtest-16 secondarycache all default
zroot/DATA/vtest-16 usedbysnapshots 0 -
zroot/DATA/vtest-16 usedbydataset 101G -
zroot/DATA/vtest-16 usedbychildren 0 -
zroot/DATA/vtest-16 usedbyrefreservation 886M -
zroot/DATA/vtest-16 logbias latency default
zroot/DATA/vtest-16 dedup off default
zroot/DATA/vtest-16 mlslabel -
zroot/DATA/vtest-16 sync standard default
zroot/DATA/vtest-16 refcompressratio 1.00x -
zroot/DATA/vtest-16 written 101G -
zroot/DATA/vtest-16 logicalused 100G -
zroot/DATA/vtest-16 logicalreferenced 100G -
root@storage01:~ #
Solution 1:
VOLSIZE represents that size of the volume as it will be seen by the clients, not the size of the volume as stored on the pool.
This difference may come from multiple sources:
- space required for metadata
- space required for storing multiple copies (the "copies" parameters)
- "wasted space" due to padding while aligning blocks of "volblocksize" size to vdev structure; by vdev structure I mean two parameters: number of disks in raidz-N, and physical block size of the devices.
When creating a volume, zfs will estimate how much space it would need to use in order to be able to present a volume of "volsize" to its clients. You can see that difference in the vtest-16 and vtest-3 volumes (where refreservation is 102GB and volsize is 100GB). The calculation can be found in libzfs_dataset.c (zvol_volsize_to_reservation(uint64_t volsize, nvlist_t *props))
What is not taken into account by that calculation is the third source. The third source has little impact on vdevs that are created with disks that have 512 byte sectors. From my experiments (I have tested that by filling an entire zvol to check that out) it does make quite a difference when the vdev is created over newer 4K sector disks.
Another thing I found in my experiments is that having mirrors does not show differences between calculated refreservation and what ends up to be used.
These are my results when using 4K drives with volumes that have the default volblocksize (8K). The first column represents the number of disks in a vdev:
raidz1 raidz2
3 135% 101%
4 148% 148%
5 162% 181%
6 162% 203%
7 171% 203%
8 171% 217%
9 181% 232%
10 181% 232%
11 181% 232%
12 181% 232%
These are my results when using 512 bytes sector drives and default volblocksize of 8K. The first column represents the number of disks in a vdev:
raidz1 raidz2
3 101% 101%
4 104% 104%
5 101% 113%
6 105% 101%
7 108% 108%
8 110% 114%
9 101% 118%
10 102% 106%
11 103% 108%
12 104% 110%
My conclusions are the following:
- Do not use 4K drives
- If you really need to use them, create the volume using volblocksize greater than or equal to 32K; There is negligible performance impact, and negligible space usage overhead (larger block sizes require less padding to properly align).
- Prefer stripes of mirrors for your pools; This layout has both performance benefits and less space related surprises like this one.
- The estimation is clearly wrong for the cases outlined above, and this is a bug in zfs.