Deleting large no of files on linux eats up CPU
I generate more than 50GB of cache files on my RHEL server (and typical file size is 200kb so no of files is huge). When I try to delete these files it takes 8-10 hours.
However, the bigger issue is that the system load goes to critical for these 8-10 hours. Is there anyway where I can keep the system load under control during the deletion.
I tried using
nice -n19 rm -rf *
but that doesn't help in system load.
P.S. I asked the same question on superuser.com but didn't get a good enough answer so trying here.
Here are some benchmarks for various operations and filesystems for your reference. (On a busy system of course you would have different results, but hopefully this will give you an idea of what to expect).
If I would be in your chair I would try to get a baseline benchmark of the scenario:
- establish how long the operation would take on bare hardware isolated of everything else (and yes, it should take much, much less then 7-8 hours even on pretty old hardware).
- try to add other operations that typically occur in a controlled manner and see what actually makes it run so long
Some numbers.
On 5 year old notebook, ext3 mounted rw, noatime, running top and nothing much more create 10k directories with shell script create10kdirs.sh
#!/bin/bash
for i in $(seq 10000)
do
mkdir $i
done
sudo time ./create10kdirs.sh
24.59user
20.70system
0:47.04elapsed
96%CPU (0avgtext+0avgdata 0maxresident)k80inputs+8outputs (1major+2735150minor)pagefaults 0swaps
delete 10k directories with
sudo time rm -rf
0.10user
19.75system
0:20.71elapsed
95%CPU (0avgtext+0avgdata 0maxresident)k0inputs+8outputs (0major+222minor)pagefaults 0swaps
same hardware, ext4 mounted rw, noatime
create 10k directories with shell script
sudo time create10kdirs.sh
23.96user
22.31system
0:49.26elapsed
93%CPU (0avgtext+0avgdata0maxresident)k1896inputs+8outputs(20major+2715174minor)pagefaults 0swaps
delete 10k directories with
sudo time rm -rf
0.13user
16.96system
0:28.21elapsed
60%CPU (0avgtext+0avgdata0maxresident)k10160inputs+0outputs(1major+219minor)pagefaults0swaps
4 year old notebook, xfs mounted rw,relatime,nobarrier on USB
sudo time create10kdirs.sh
14.19user
13.86system
0:29.75elapsed
94%CPU (0avgtext+0avgdata0maxresident)k432inputs+0outputs(1major+2735243minor)pagefaults 0swaps
delete 10k directories with
sudo time rm -rf
0.13user
2.65system
0:08.20elapsed
33%CPU (0avgtext+0avgdata 0maxresident)k120inputs+0outputs (1major+222minor)pagefaults 0swaps
Conclusion: This old hardware would erase 400k small files+folders on ext3 in approx 21s * 40 = 12m40s. On xfs (with nobarriers) it would do it in approx 5m20s. Granted in both test cases the test machine was not under heavy load, but to me it seems that your problems are not strictly related to your choice of filesystem.
EDIT2 Also, after running above benchmarks I went to try the delete with find . -mindepth 1 -maxdepth 1 -delete
and the results!:
ext3
delete 10k directories with
sudo time find . -mindepth 1 -maxdepth 1 -delete
0.04user
0.44system
0:00.88elapsed
55%CPU (0avgtext+0avgdata 0maxresident)k516inputs+8outputs(1major+688minor)pagefaults0swaps
ext4
delete 10k directories with
sudo time find . -mindepth 1 -maxdepth 1 -delete
0.05user
0.66system
0:01.02elapsed
70%CPU (0avgtext+0avgdata 0maxresident)k568inputs+0outputs (1major+689minor)pagefaults swaps
xfs
delete 10k directories with
sudo time find . -mindepth 1 -maxdepth 1 -delete
0.06user
0.84system
0:04.55elapsed
19%CPU (0avgtext+0avgdata 0maxresident)k416inputs+0outputs (3major+685minor)pagefaults 0swaps
Real conclusion is that rm -rf is not very clever and that it'll under-perform for big trees. (providing that my test case is really representative).
Note: I also tested xargs variant and it is fast, but not as fast as the above.
As you mentioned in a comment, you are using ext3
.
It is well known that rm
performance for large files on ext3 is poor; it is one of the things which were fixed in ext4
. See for instance this post, or kernelnewbies (which mentions extents improve delete and truncate speeds for large files).
I do not know how much that applies to your typical file sizes. I would it expect it to apply at least a little, since with around 200kB you would already be using indirect blocks on ext3
, versus possibly a single extent on ext4
.
As a workaround (since you probably will not upgrade to ext4
just for that), delete only a few files each time and add a sleep
between the deletions. It is not pretty, but should help reducing the load.
Also, if losing the files on power loss is not a problem (since it is a cache of some sort), you could put them in a separate partition which you mkfs
again on boot, and use ext3
without a journal or even ext2
. The cause for the high load is probably the journal being flushed to disk conflicting with the reads (you mentioned in another post that you have lots of concurrent reads).
Maybe the shell is the cause of the problem. You should use directly find : find /dir -mindepth 1 -maxdepth 1 -delete
This may or may not be related: but I have had occasions where rm
could not handle the number of files I provided it on the command line (through the star operator). Instead I would use the following command from the shell:
for i in *; do rm -rf $i; done
In this case you could be deleting trees in which case the above may not do what you need. You might have to split up the delete operation into parts, e.g.
for i in [a-mA-M]*; do rm -rf $i; done
for i in [n-zN-Z]*; do rm -rf $i; done