Unable to create files on large XFS filesystem

We have a Linux server with a 4 TB filesystem, which is used to store subversion repositories. There are many repositories, several of which have been in use for several years.

The disk was originally about 1 TB, but we started running out of space and increased it to 4 TB about a year ago. Now, people are reporting being unable to check in files to their repos. The error message is No space left on device.

The disk has about 1,5 TB free, and also reports having free inodes - and yet, it's not possible to create a new file on it. It's still possible to update old files, and intermittently some repositories will be updated, but the same repository may fail on the next attempt.


Solution 1:

The reason for the problem

The issue turns out to be in how XFS allocates inodes. Unlike most file systems, allocation happens dynamically as new files are created. However, unless you specify otherwise, inodes are limited to 32-bit values, which means that they must fit within the first terabyte of storage on the file system. So if you completely filled that first terabyte, and then you enlarge the disk, you would still be unable to create new files, since the inodes can't be created on the new space.

Solution 1 - change mount options

One solution is to re-mount the file system with the mount option inode64. However some applications will behave weirdly on this (e.g. MySQL), and NFS will be very confused. So if you're not sure that your system will work with this option, you can move on to the next option.

Solution 2 - move files

The second solution is to find some of the files that are currently stored in the first terabyte, and move them to another area of the file system.

Moving by age

In our case, this was easy - the file system had been in use for years, so we could simply find the oldest files and move them away from the file system, and then move them back. This was easily done using find:

find /extra -mindepth 3 -maxdepth 3 -type d -mtime +730 -exec du -sh {} \; > /tmp/olddirs.txt

gave us a list containing the size and directory name for all directories at exactly 3 levels below the mountpoint, which were older than 2 years. We could then sort the list to find the largest directories, and use mv to move them away to another file system and back again.

Moving by allocation group

If you can't simply go by age, e.g. when a lot of files were created at the same time, you can still find the right files to move, but it takes a bit more time.

XFS has allocation groups (aka AGs), starting with 0. You could check the block size and number of blocks of each AG to figure out which groups are on the first terabyte, using xfs_info /path/to/mountpoint. Or you can just check the first few AGs to see which ones are full, and then clear those.

  1. Checking the free space in the first four AGs:
for ag in `seq 0 1 5`; do echo freespace in AG $ag; xfs_db -r -c "freesp -s -a $ag" /dev/CACHE/CACHE ; grep "total free"; done

If the total free space in any group is less than 40, you won't be able to create new files in it.

  1. Find files in that AG

This requres checking the metadata for each file on the filesystem. It will take a long time... Here's a suggestion:

   find /extra -mindepth 3 -type f -exec xfs_bmap -v {} \; > /tmp/agfilelist.txt

You can then grep for " 0 " (that's a space, a zero and another space) to find all files on AG 0, grep for " 1 " to find the ones on AG 1, etc... Start with AG 0, move the largest files away (using mv, not cp!) and then back again. Repeat until you have a fair amount of space free.

Outcome

Once we'd moved enough files away from /extra and then back again, there was lots of space in AG 0 and it was once again possible to create new files.