"No space left on device" error despite having plenty of space, on btrfs
Almost everywhere I'm getting failures in logs complaining about No space left on device
Gitlab logs:
==> /var/log/gitlab/nginx/current <==
2016-11-29_20:26:51.61394 2016/11/29 20:26:51 [emerg] 4871#0: open() "/var/opt/gitlab/nginx/nginx.pid" failed (28: No space left on device)
Dovecot email logs:
Nov 29 20:28:32 aws-management dovecot: imap([email protected]): Error: open(/home/vmail/emailuser/Maildir/dovecot-uidlist.lock) failed: No space left on device
Output of df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 ext4 7.8G 3.9G 3.8G 51% /
devtmpfs devtmpfs 1.9G 28K 1.9G 1% /dev
tmpfs tmpfs 1.9G 12K 1.9G 1% /dev/shm
/dev/xvdh btrfs 20G 13G 7.9G 61% /mnt/durable
/dev/xvdh btrfs 20G 13G 7.9G 61% /home
/dev/xvdh btrfs 20G 13G 7.9G 61% /opt/gitlab
/dev/xvdh btrfs 20G 13G 7.9G 61% /var/opt/gitlab
/dev/xvdh btrfs 20G 13G 7.9G 61% /var/cache/salt
Looks like there is also plenty of inode space. Output of df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 524288 105031 419257 21% /
devtmpfs 475308 439 474869 1% /dev
tmpfs 480258 4 480254 1% /dev/shm
/dev/xvdh 0 0 0 - /mnt/durable
/dev/xvdh 0 0 0 - /home
/dev/xvdh 0 0 0 - /opt/gitlab
/dev/xvdh 0 0 0 - /var/opt/gitlab
/dev/xvdh 0 0 0 - /var/cache/salt
Output of btrfs fi show
Label: none uuid: 6546c241-e57e-4a3f-bf43-fa933a3b29f9
Total devices 4 FS bytes used 11.86GiB
devid 1 size 10.00GiB used 10.00GiB path /dev/xvdh
devid 2 size 10.00GiB used 9.98GiB path /dev/xvdi
devid 3 size 10.00GiB used 9.98GiB path /dev/xvdj
devid 4 size 10.00GiB used 9.98GiB path /dev/xvdk
Output of btrfs fi df /mnt/durable
Data, RAID10: total=17.95GiB, used=10.12GiB
Data, single: total=8.00MiB, used=0.00
System, RAID10: total=16.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID10: total=2.00GiB, used=1.74GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=272.00MiB, used=8.39MiB
What could be the cause of this? I'm using a base linux AMI ec2 kernal version 4.4.5-15.26.amzn1.x86_64
Update
Running the command suggested below btrfs fi balance start -dusage=5 /mnt/durable
gave me back an error of the following:
ERROR: error during balancing '/mnt/durable' - No space left on device
There may be more info in syslog - try dmesg | tail
After manually deleting a bunch of larger files totaling to ~1GB I rebooted the machine and tried again, making sure I was using sudo, and the command executed. I then rebooted my machine once again for good measure and it seems to have solved the problem
Welcome to the world of BTRFS. It has some tantalizing features but also some infuriating issues.
First off, some info on your setup, it looks like you have four drives in a BTRFS "raid 10" volume (so all data is stored twice on different disks). This BTRFS volume is then carved up into subvolumes on different mount points. The subvolumes share a pool of disk space but have separate inode numbers and can be mounted in different places.
BTRFS allocates space in "chunks", a chunk is allocated to a specific class of either data or metadata. What can happen (and looks like has happened in your case) is that all free space gets allocated to data chunks leaving no room for metadata
It also seems that (for reasons I don't fully understand) that BTRFs "runs out" of metadata space before the indicator of the proportion of metadata space used reaches 100%.
This appears to be what has happened in your case, there is lots of free data space but no free space that has not been allocated to chunks and insufficient free space in the existing metadata chunks.
The fix is to run a "rebalance". This will move data around so that some chunks can be returned to the "global" free pool where they can be reallocated as metadata chunks
btrfs fi balance start -dusage=5 /mnt/durable
The number after -dusage
sets how aggressive the rebalance is, that is how close to empty the blocks have to be to get rewritten. If the balance says it rewrote 0 blocks try again with a higher value of -dusage
.
If the balance fails then I would try rebooting and/or freeing up some space by removing files.
Since you're running btrfs with a RAID setup, try running a balance operation.
btrfs balance start /var/opt/gitlab
If this gives an error about not having enough space, try again with this syntax:
btrfs balance start -musage=0 -dusage=0 -susage=0 /var/opt/gitlab
Repeat this operation for each btrfs filesystem where you are seeing errors about space. If your space problem is due to the metadata not being distributed across the mirrored disks this might free up some space for you.
On my system, I added the following job in cron.monthly.
The clear_cache
remount is due to some corruption issues btrfs was having with the free maps. (I think they finally found the issue, but the issue is so annoying, I'm willing to pay to rebuild the maps once a month.)
I ramp up the usage
options to free up space gradually for larger and larger balances.
#!/bin/sh
for mountpoint in `mount -t btrfs | awk '{print $3}' | sort -u`
do
echo --------------------------
echo Balancing $mountpoint :
echo --------------------------
echo remount with clear_cache...
mount -oremount,clear_cache $mountpoint
echo Before:
/usr/sbin/btrfs fi show $mountpoint
/usr/sbin/btrfs fi df $mountpoint
for size in 0 1 5 10 20 30 40 50 60 70 80 90
do
time /usr/sbin/btrfs balance start -v -musage=$size $mountpoint 2>&1
time /usr/sbin/btrfs balance start -v -dusage=$size $mountpoint 2>&1
done
echo After:
/usr/sbin/btrfs fi show $mountpoint
/usr/sbin/btrfs fi df $mountpoint
done
If you get to the point where you can't rebalance because you have insufficient space, the recommendation is to temporarily add another block device (or loopback device on another disk) of some sort to your volume for the duration of the rebalance, and then remove it.