Remove all but every 12th file
I have a few thousand files in the format filename.12345.end . I only want to keep every 12th file, so file.00012.end, file.00024.end ... file.99996.end and delete everything else.
The files may also have numbers earlier in their filename, and are normally of the form: file.00064.name.99999.end
I use Bash shell and can't figure out how to loop over the files and then get out the number and check whether it is number%%12=0
deleting the file if not. Can anyone help me?
Thank you, Dorina
Solution 1:
Here's a Perl solution. This should be much faster for thousands of files:
perl -e '@bad=grep{/(\d+)\.end/ && $1 % 12 != 0}@ARGV; unlink @bad' *
Which can be further condensed into:
perl -e 'unlink grep{/(\d+)\.end/ && $1 % 12 != 0}@ARGV;' *
If you have too many files and can't use the simple *
, you can do something like:
perl -e 'opendir($d,"."); unlink grep{/(\d+)\.end/ && $1 % 12 != 0} readdir($dir)'
As for speed, here's a comparison of this approach and the shell one provided in one of the other answers:
$ touch file.{01..64}.name.{00001..01000}.end
$ ls | wc
64000 64000 1472000
$ time for f in ./* ; do file="${f%.*}"; if [[ $((10#${file##*.} % 12)) -ne 0 ]]; then rm "$f"; fi; done
real 2m44.258s
user 0m9.183s
sys 1m7.647s
$ touch file.{01..64}.name.{00001..01000}.end
$ time perl -e 'unlink grep{/(\d+)\.end/ && $1 % 12 != 0}@ARGV;' *
real 0m0.610s
user 0m0.317s
sys 0m0.290s
As you can see, the difference is enormous, as expected.
Explanation
-
The
-e
is simply tellingperl
to run the script given on the command line. -
@ARGV
is a special variable containing all the arguments given to the script. Since we're giving it*
, it will contain all the files (and directories) in the current directory. -
The
grep
will search through the list of file names and look for any that match a string of numbers, a dot andend
(/(\d+)\.end/)
. -
Because the numbers (
\d
) are in a capture group (parentheses), they are saved as$1
. So thegrep
will then check whether that number is a multiple of 12 and, if it isn't, the file name will be returned. In other words, the array@bad
holds the list of files to be deleted. -
The list is then passed to
unlink()
which removes files(but not directories).
Solution 2:
Given that your filenames are in the format file.00064.name.99999.end
, we first need to trim away everything except our number. We'll use a for
loop to do this.
We also need to tell the Bash shell to use base 10, because Bash arithmetic will treat them numbers beginning with a 0 as base 8, which will mess things up for us.
As a script, to be launched when in the directory containing files use:
#!/bin/bash
for f in ./*
do
if [[ -f "$f" ]]; then
file="${f%.*}"
if [[ $((10#${file##*.} % 12)) -ne 0 ]]; then
rm "$f"
fi
else
echo "$f is not a file, skipping."
fi
done
Or you can use this very long ugly command to do the same thing:
for f in ./* ; do if [[ -f "$f" ]]; then file="${f%.*}"; if [[ $((10#${file##*.} % 12)) -ne 0 ]]; then rm "$f"; fi; else echo "$f is not a file, skipping."; fi; done
To explain all of the parts:
-
for f in ./*
means for everything in the current directory, do.... This sets each file or directory found as the variable $f. -
if [[ -f "$f" ]]
checks whether the item found is a file, if not we skip to theecho "$f is not...
part, which means we don't start deleting directories accidentally. -
file="${f%.*}"
sets the $file variable as the filename trimming off whatever comes after the last.
. -
if [[ $((10#${file##*.} % 12)) -eq 0 ]]
is where the main Arithmetic kicks in. The${file##*.}
trims everything before the last.
in our filename without extension.$(( $num % $num2 ))
is the syntax for Bash arithmetic to use the modulo operation, the10#
at the start tells Bash to use base 10, to deal with those pesky leading 0s.$((10#${file##*.} % 12))
then leaves us the remainder of our filenames number divided by 12.-ne 0
checks whether the remainder is "not equal" to zero. - If the remainder is not equal to 0, the file is deleted with the
rm
command, you may want to replacerm
withecho
when first running this, to check that you get the expected files to delete.
This solution is non-recursive, meaning that it will only process files in the current directory, it won't go into any sub-directories.
The if
statement with the echo
command to warn about directories is not really necessary as rm
on it's own will complain about directories, and not delete them, so:
#!/bin/bash
for f in ./*
do
file="${f%.*}"
if [[ $((10#${file##*.} % 12)) -ne 0 ]]; then
rm "$f"
fi
done
Or
for f in ./* ; do file="${f%.*}"; if [[ $((10#${file##*.} % 12)) -ne 0 ]]; then rm "$f"; fi; done
Will work correctly too.
Solution 3:
You can use Bash bracket expansion to generate names containing every 12th number. Let's create some test data
$ touch file.{0..9}{0..9}{0..9}{0..9}{0..9}.end # create test data
$ mv file.00024.end file.00024.end.name.99999.end # testing this form of filenames
Then we can use the following
$ ls 'file.'{00012..100..12}* # print these with numbers less than 100
file.00012.end file.00036.end file.00060.end file.00084.end
file.00024.end.name.99999.end file.00048.end file.00072.end file.00096.end
$ rm 'file.'{00012..100000..12}* # do the job
Works hopelessly slow for large amount of files though - it takes time and memory to generate thousands of names - so it's more a trick that actual efficient solution.
Solution 4:
A little bit long, but is what came to my mind.
for num in $(seq 1 1 11) ; do
for sequence in $(seq -f %05g $num 12 99999) ; do
rm file.$sequence.end.99999;
done
done
Explanation: Delete every 12th file eleven times.