How to keep subtree removal (`rm -rf`) from starving other processes for Disk I/O?
Solution 1:
All data gathered from this page. Below are some options to delete large directory of files. Check out the writeup for the details of how this was produced.
Command Elapsed System Time %CPU cs1* (Vol/Invol) rsync -a –delete empty/ a 10.60 1.31 95% 106/22 find b/ -type f -delete 28.51 14.46 52% 14849/11 find c/ -type f | xargs -L 100 rm 41.69 20.60 54% 37048/15074 find d/ -type f | xargs -L 100 -P 100 rm 34.32 27.82 89% 929897/21720 rm -rf f 31.29 14.80 47% 15134/11
*cs1 is context switches voluntary and involuntary
Solution 2:
Removing files performs only metadata operations on the filesystem, which aren't influenced by ionice.
The simplest way would be, if you don't need the diskspace right now, to perform the rm
during off-peak hours.
The more complex way that MIGHT work is to spread the deletes out over time. You can try something like the following (note that it assumes your paths and file names DO NOT contain any spaces!):
while find dir -type f | head -n 100 | xargs rm; do sleep 2; done
while find dir -type d -depth | head -n 100 | xargs rmdir; do sleep 2; done
Also note that you can't use rm -f
in the first command because then the loop wouldn't stop (it depends on the error exit code of rm
when there is no argument).
You can tweak it by modifying the number of deletes per cycle (100 in the example) and the sleep duration. It might not really work however since the filesystem might still bunch up the metadata updates in a way that you get into trouble with your IO load. You just have to try.