How to remove directory on NFS fileystem with an enormous number of files

A poorly tested program created a directory on an NFS share with an enormous number of files, which I need to remove.

ls -ald /home/foo
drwxrwxr-x 2 503 503 317582336 Jul 29 11:38 /home/foo

The directory is located on an NFS mount of about 600GB on a netapp-type device. I atually have no idea how many files are in it but a similar directory created after only 10 minutes has 121,000 files, so it's probably in the millions somewhere. OS is Linux 2.6 kernel.

Trying to find a way to list or remove it and its contents. find /home/foo results in find dying after about 1 hour, with no output other than "./"


Solution 1:

(answering my own question in case anyone finds it while searching for similar.) There are possibly as many as 9 million files in the directory.

Unfortunately can't log in to the server directly, it's an appliance. The only access to the filesystems is via export.

rm -rf didn't seem to work. watching with strace it was hanging.

find woudn't complete, died with no error.

ls -1 never seemed to complete. (I realize now that it attempts to sort the results, ls -1f might have worked eventually).

what did work was a simple perl snippet. I assume c code do the same would work.

 opendir( my $dh,  '/home/foo' ) or die $!
    while ( my $file = readdir $dh ) {
        print "$file\n";
    }

Solution 2:

This rather old thread came up for me on Google, so I'd like to share some statistics.

Here is a comparison of three different methods to remove files on an NFS server:

  1. plain rm: rm dir/*
  2. find: find dir/ -type f -exec rm {} \;
  3. rsync: tempdir=$( mktemp -d ); \ rsync -a --delete $tempdir/ dir/; \ rmdir $tempdir

To compare these methods I created 10000 files each time I ran a test with

for i in {1..10000} ; do touch $i ; done

The results on the plot show that rsync is much faster and find is the slowest of the three methods performance of different methods to remove multiple files, rsync is faster

The results stay when the number of files is doubled (I did not run find on 20000 files), time averaged over 3 runs for 10000 files and 2 runs for 20000 files.

        10000    20000
find     28.3       -
rm       12.9     23.9
rsync     6.94    12.2

It is interesting to see what else does the performance of these methods depend on.

A related post on this site discusses the deletion of a big number of files on an ext3 filesystem.

Solution 3:

I would suggest that you NOT try to remove these files over NFS -- Log in to the file server directly and delete the files there. This will be substantially less abusive to the NFS server (and the client).

Beyond that, use find (as described by MattBianco) or use ls -1 | xargs rm -f (from within that directory) if find is having trouble completing (the latter should work OK over NFS, though again I would recommend doing it locally).

Solution 4:

Maybe find /home/foo -mount -depth -type f -exec rm -f {} \; could be helpful.
-exec makes find execute a command (terminated by the semicolon: \;), with the braces {} replaced by the file's pathname.
This means one rm process for each file to remove.
-type f only does it for files, in case you have a directory structure under /home/foo, the directories will remain. Only files will be removed.