How to keep subtree removal (`rm -rf`) from starving other processes for Disk I/O?

Solution 1:

All data gathered from this page. Below are some options to delete large directory of files. Check out the writeup for the details of how this was produced.

Command                                 Elapsed System Time %CPU cs1* (Vol/Invol)
rsync -a –delete empty/ a                10.60      1.31    95%  106/22
find b/ -type f -delete                  28.51      14.46   52%  14849/11
find c/ -type f | xargs -L 100 rm        41.69      20.60   54%  37048/15074
find d/ -type f | xargs -L 100 -P 100 rm 34.32      27.82   89%  929897/21720
rm -rf f                                 31.29      14.80   47%  15134/11

*cs1 is context switches voluntary and involuntary

Solution 2:

Removing files performs only metadata operations on the filesystem, which aren't influenced by ionice.

The simplest way would be, if you don't need the diskspace right now, to perform the rm during off-peak hours.

The more complex way that MIGHT work is to spread the deletes out over time. You can try something like the following (note that it assumes your paths and file names DO NOT contain any spaces!):

while find dir -type f | head -n 100 | xargs rm; do sleep 2; done
while find dir -type d -depth | head -n 100 | xargs rmdir; do sleep 2; done

Also note that you can't use rm -f in the first command because then the loop wouldn't stop (it depends on the error exit code of rm when there is no argument).

You can tweak it by modifying the number of deletes per cycle (100 in the example) and the sleep duration. It might not really work however since the filesystem might still bunch up the metadata updates in a way that you get into trouble with your IO load. You just have to try.