Btrfs on SSD, "no space left on device"; catch-22 with `fstrim` and `btrfs balance`; how to recover?

The root filesystem of my Kubuntu (mounted under /) is Btrfs. I don't use -o discard as a mount option. This means I need to run fstrim on demand.

In the past I hit this problem: btrfs, no diskspace left. I noticed fstrim -v / showed almost no space being trimmed. My solution was to run btrfs balance start / before fstrim. This is the gist of my answer there.

Today it's different. Maybe I'm too late with the maintenance. This is what happens:

# fstrim -v /
/: 24 KiB (24576 bytes) trimmed
# btrfs balance start /
ERROR: error during balancing '/': No space left on device

I deleted few subvolumes (snapshots) with btrfs subvolume delete … and it didn't help. I cannot remember details very well but I think previously I could run btrfs balance … because preliminary fstrim trimmed at least few MiB, not as little as 24 KiB like today. Now it seems like a catch-22 situation where fstrim or btrfs balance would only work if the other did its job first.

For the record, these are some stats that show I have in fact plenty of space:

# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       112G   43G   68G  39% /

# btrfs fi df /
Data, single: total=108.73GiB, used=41.00GiB
System, single: total=64.00MiB, used=16.00KiB
Metadata, single: total=3.00GiB, used=1.02GiB
GlobalReserve, single: total=352.00MiB, used=0.00B

Note I haven't got "no space left on device" during normal operation yet. I think Btrfs keeps fitting new writes inside already taken chunks. However in the past I hit "no space left …" during apt-get upgrade, then I recovered with btrfs balance and fstrim. I don't know when (if) this strikes me again. I'd like to do my maintenance before I get "no space left …" when doing something important.

How to recover from this situation so fstrim and btrfs balance do not block each other? Can I fix this from within my running system?

In fact I have already fixed this, my answer is below. The question is for future reference. Feel free to add another solution.


Additional information:

$ uname -a
Linux foobar 4.4.0-78-generic #99-Ubuntu SMP […] x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Ubuntu 16.04.3 LTS \n \l

# dpkg -l | grep btrfs
ii  btrfs-tools  4.4-1ubuntu1  amd64  Checksumming Copy on Write Filesystem utilities

Yes, you can recover from within your running system. My original approach is down below; however, thanks to Zan Lynx's comment I found an easier way.

My improved approach

This is the mentioned comment:

Or if you're thinking ahead you can tell btrfs to use less than maximum of the device with btrfs filesystem resize

(Comparing to my original approach, the point is to deliberately have some free space on this particular device and expand the filesystem there, rather than adding a separate device which may not be that easy.)

Good news: my tests indicate I don't have to think ahead! Even if btrfs balance start / throws "no space left …", I'm still able to shrink the filesystem, if only there is room for it (i.e. all files and metadata fit into the new size). This leads to the following solution:

# btrfs filesystem resize -100M /  # shrink a little...
Resize '/' of '-100M'
# btrfs filesystem resize +100M /  # ... and expand back
Resize '/' of '+100M'
# btrfs balance start /            # should work now
Done, had to relocate 88 out of 88 chunks
# fstrim -v /
/: 67,8 GiB (72753831936 bytes) trimmed

My original approach

This is what you need to do (detailed description down below):

  1. Add an extra device to the Btrfs filesystem.
  2. btrfs balance start …
  3. fstrim …
  4. Delete the extra device from the Btrfs filesystem.
  5. btrfs balance start …
  6. fstrim …

The trick is to add an extra device to the Btrfs filesystem, so btrfs balance … has some additional space. The device may be like /dev/sdb or /dev/sdb3. In this example I'm using a regular 1 GiB file on my HDD (very important: I double check the file doesn't belong to the Btrfs filesystem I want to expand! this could be fatal). I think a file in RAM (e.g. in /dev/shm/) should do as fine.

# tmpf=/mnt/hdd/tempfile   # if this file exists, it will be overwritten!
# truncate -s 1G "$tmpf"
# extra=$(losetup -f --show "$tmpf")

Now $extra is like /dev/loop0 or something.

# btrfs device add "$extra" /

At this moment I mustn't reboot my OS. If I did, it would lack a part of its root filesystem because no /dev/loop* would be associated with /mnt/hdd/tempfile. This will not be a problem if you use a regular device (or a partition) as the extra device because btrfs device scan during boot will detect it.

# btrfs balance start /

In my case the tempfile is a sparse file. In another console I run watch ls -hls /mnt/hdd/tempfile and I notice when it grows to its (almost) full size. This way I know when some Btrfs chunks are moved from the SSD. When in any doubt, let btrfs ballance … finish; but I invoke btrfs balance cancel / to save some time. Now let's go back to the main console.

Note: the first line below is from the above btrfs balance start / command that was interrupted.

balance canceled by user
# fstrim -v /
/: 26,7 GiB (28696862720 bytes) trimmed

fstrim trimmed way more than before. I don't need my extra device anymore.

# btrfs device delete "$extra" /   # may take a while
# btrfs balance start /            # should work now
Done, had to relocate 88 out of 88 chunks
# fstrim -v /
/: 67,8 GiB (72753831936 bytes) trimmed

And this is it. Now it's time to clean:

# losetup -d "$extra"
# rm "$tmpf"