Figuring out why I'm going over hard-drive quota

I suck at system administration, so if I'm getting something basic wrong, please let me know.

Here is something that drives me nuts. At work, we have a big NFS server that serves all the employees of our company. Everyone has a certain number of GBs that they're allowed to write to it. I often get "quota exceeded" errors, because I run some programs that generate a lot of temporary files and then delete them, but before they can delete them they hit the quota.

After talking with our sysadmins, I learned that my quota was already increased to well beyond what I need for these tests, but it seems that I'm spending this quota in places other than my home folder. The sysadmin explained to me that every file in the NFS server which has my username as an owner, counts against my quota.

I wanted to get a list of these files so I could delete a lot of files that I don't need anymore. But he told me that the only way is to do a search of the entire filesystem of the entire company, going through everyone's home folders. i.e., a time-consuming process. He's doing this search right now.

What sounds weird to me is this: When Linux gives me a "quota exceeded" error, it seems to be able to know instantly that I'm going over my quota. Not a time-consuming process. So how come I can't get the list of files that are counted against my quota, without doing a long search?


I can think of two things that might be causing your quota problems.

First, you should know that quotas are implemented by creating a tiny database on the filesystem, which is updated each time a file is created, modified or deleted. (Actually there are two of them, one for user quotas and one for group quotas.) When quotas were first turned on, this database was initialized by checking the usage of every file on the filesystem and recording the results per user and/or per group in these files. Because they are kept up to date by the filesystem driver every time there is activity, looking up a user's current quota usage is fast.

There is a problem. The quota database can be corrupted if the filesystem isn't unmounted cleanly, for instance if there's a hard power off. When this happens, the admin should run quotacheck to verify and rebuild the database when rebooting the system, but this might not have happened. Or cosmic rays or hard drive failure could corrupt them.

Running quotacheck, however, requires that the filesystem be unmounted, or at minimum mounted read-only, so it's unavailable for use while the quota database is being rebuilt. This could take a long time, so it is something that unfortunately rarely gets done. The NFS server admin should schedule downtime to check the filesystem quotas, and should consider changing procedures so that quotacheck is always run when rebooting after a crash.

Second, based on your description, it's possible that you've hit the inode quota. In addition to restricting the amount of disk space, quotas can also restrict the number of files that can be created. If you create large numbers of temporary files, then this may be what is happening. You (or the NFS server admin) should also check this. Run quota -s to see what the database thinks you have used compared to your limits.


Other than the corruption possibility, which @MichaelHampton suggests, here are some basics:

  • check under /tmp . Sometimes some processes are messy, or get interrupted or killed - session files, installers/unzips, print jobs and suchlike. Look for locked or hidden (dot-)files. If you find something, don't just delete it, use timestamps to figure what created it and when.

  • Also, ps -edalf and review your process list and all its file arguments, and see if anything mystery is creating unwanted large files or in unexpected areas.

  • "It seems that I'm spending this quota in places other than my home folder" Well do you know roughly which directory is taking up which space, or it's completely blind?

  • Figure out whether you have a lot of small files/directories, or some large files, or both. Try to do a quick estimate with a du -sh ... or find -size <threshold> ... To see if newer files are getting created, touch a sentinel file when you log out in the evening, then next day find ... -newer SENTINEL to see if anything's been created. You could cron that to run in the middle of the night.

  • I guess a total failsafe method, if all else fails (which is impossible to conceive), would be to have them temporarily create a second homedir for you, and gradually clone over your setup, make a note of which applications you enable, then see when things go boom. (Binary-search triage, you know.)