"No space left on device" error despite having plenty of space, on btrfs

Almost everywhere I'm getting failures in logs complaining about No space left on device

Gitlab logs:

==> /var/log/gitlab/nginx/current <==
2016-11-29_20:26:51.61394 2016/11/29 20:26:51 [emerg] 4871#0: open() "/var/opt/gitlab/nginx/nginx.pid" failed (28: No space left on device)

Dovecot email logs:

Nov 29 20:28:32 aws-management dovecot: imap([email protected]): Error: open(/home/vmail/emailuser/Maildir/dovecot-uidlist.lock) failed: No space left on device

Output of df -Th

Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/xvda1     ext4      7.8G  3.9G  3.8G  51% /
devtmpfs       devtmpfs  1.9G   28K  1.9G   1% /dev
tmpfs          tmpfs     1.9G   12K  1.9G   1% /dev/shm
/dev/xvdh      btrfs      20G   13G  7.9G  61% /mnt/durable
/dev/xvdh      btrfs      20G   13G  7.9G  61% /home
/dev/xvdh      btrfs      20G   13G  7.9G  61% /opt/gitlab
/dev/xvdh      btrfs      20G   13G  7.9G  61% /var/opt/gitlab
/dev/xvdh      btrfs      20G   13G  7.9G  61% /var/cache/salt

Looks like there is also plenty of inode space. Output of df -i

Filesystem     Inodes  IUsed  IFree IUse% Mounted on
/dev/xvda1     524288 105031 419257   21% /
devtmpfs       475308    439 474869    1% /dev
tmpfs          480258      4 480254    1% /dev/shm
/dev/xvdh           0      0      0     - /mnt/durable
/dev/xvdh           0      0      0     - /home
/dev/xvdh           0      0      0     - /opt/gitlab
/dev/xvdh           0      0      0     - /var/opt/gitlab
/dev/xvdh           0      0      0     - /var/cache/salt

Output of btrfs fi show

Label: none  uuid: 6546c241-e57e-4a3f-bf43-fa933a3b29f9
        Total devices 4 FS bytes used 11.86GiB
        devid    1 size 10.00GiB used 10.00GiB path /dev/xvdh
        devid    2 size 10.00GiB used 9.98GiB path /dev/xvdi
        devid    3 size 10.00GiB used 9.98GiB path /dev/xvdj
        devid    4 size 10.00GiB used 9.98GiB path /dev/xvdk

Output of btrfs fi df /mnt/durable

Data, RAID10: total=17.95GiB, used=10.12GiB
Data, single: total=8.00MiB, used=0.00
System, RAID10: total=16.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID10: total=2.00GiB, used=1.74GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=272.00MiB, used=8.39MiB

What could be the cause of this? I'm using a base linux AMI ec2 kernal version 4.4.5-15.26.amzn1.x86_64

Update

Running the command suggested below btrfs fi balance start -dusage=5 /mnt/durable gave me back an error of the following:

ERROR: error during balancing '/mnt/durable' - No space left on device There may be more info in syslog - try dmesg | tail

After manually deleting a bunch of larger files totaling to ~1GB I rebooted the machine and tried again, making sure I was using sudo, and the command executed. I then rebooted my machine once again for good measure and it seems to have solved the problem


Welcome to the world of BTRFS. It has some tantalizing features but also some infuriating issues.

First off, some info on your setup, it looks like you have four drives in a BTRFS "raid 10" volume (so all data is stored twice on different disks). This BTRFS volume is then carved up into subvolumes on different mount points. The subvolumes share a pool of disk space but have separate inode numbers and can be mounted in different places.

BTRFS allocates space in "chunks", a chunk is allocated to a specific class of either data or metadata. What can happen (and looks like has happened in your case) is that all free space gets allocated to data chunks leaving no room for metadata

It also seems that (for reasons I don't fully understand) that BTRFs "runs out" of metadata space before the indicator of the proportion of metadata space used reaches 100%.

This appears to be what has happened in your case, there is lots of free data space but no free space that has not been allocated to chunks and insufficient free space in the existing metadata chunks.

The fix is to run a "rebalance". This will move data around so that some chunks can be returned to the "global" free pool where they can be reallocated as metadata chunks

btrfs fi balance start -dusage=5 /mnt/durable

The number after -dusage sets how aggressive the rebalance is, that is how close to empty the blocks have to be to get rewritten. If the balance says it rewrote 0 blocks try again with a higher value of -dusage.

If the balance fails then I would try rebooting and/or freeing up some space by removing files.


Since you're running btrfs with a RAID setup, try running a balance operation.

btrfs balance start /var/opt/gitlab

If this gives an error about not having enough space, try again with this syntax:

btrfs balance start -musage=0 -dusage=0 -susage=0 /var/opt/gitlab 

Repeat this operation for each btrfs filesystem where you are seeing errors about space. If your space problem is due to the metadata not being distributed across the mirrored disks this might free up some space for you.


On my system, I added the following job in cron.monthly.

The clear_cache remount is due to some corruption issues btrfs was having with the free maps. (I think they finally found the issue, but the issue is so annoying, I'm willing to pay to rebuild the maps once a month.)

I ramp up the usage options to free up space gradually for larger and larger balances.

#!/bin/sh

for mountpoint in `mount -t btrfs | awk '{print $3}' | sort -u`
do
    echo --------------------------
    echo Balancing $mountpoint :
    echo --------------------------
    echo remount with clear_cache...
    mount -oremount,clear_cache $mountpoint
    echo Before:
    /usr/sbin/btrfs fi show $mountpoint
    /usr/sbin/btrfs fi df $mountpoint
    for size in 0 1 5 10 20 30 40 50 60 70 80 90
    do
        time /usr/sbin/btrfs balance start -v -musage=$size $mountpoint 2>&1
        time /usr/sbin/btrfs balance start -v -dusage=$size $mountpoint 2>&1
    done
    echo After:
    /usr/sbin/btrfs fi show $mountpoint
    /usr/sbin/btrfs fi df $mountpoint
done

If you get to the point where you can't rebalance because you have insufficient space, the recommendation is to temporarily add another block device (or loopback device on another disk) of some sort to your volume for the duration of the rebalance, and then remove it.