remove file but exclude all files in a list
The rm
command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.
The check directory
section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.
You can remove the echo deleting
line to run silently.
#!/bin/bash
cd /home/me/myfolder2tocleanup/
# Exit if the directory isn't found.
if (($?>0)); then
echo "Can't find work dir... exiting"
exit
fi
for i in *; do
if ! grep -qxFe "$i" filelist.txt; then
echo "Deleting: $i"
# the next line is commented out. Test it. Then uncomment to removed the files
# rm "$i"
fi
done
This python script can do this:
#!/usr/bin/env python3
import os
no_remove = set()
with open('./dont-delete.txt') as f:
for line in f:
no_remove.add(line.strip())
for f in os.listdir('.'):
if f not in no_remove:
print('unlink:' + f )
#os.unlink(f)
Important part is to uncomment the os.unlink()
function.
NOTE: add this script and dont-delete.txt
to your dont-delete.txt
so that they both are on the list, and keep them in the same directory.
Here's a one-liner:
comm -2 -3 <(ls) <(sort dont_delete) | tail +2 | xargs -p rm
-
ls
prints all files in the current directory (in sorted order) -
sort dont_delete
prints all the files we don't want to delete in sorted order - the
<()
operator turns a string into a file-like object - The
comm
commands compares two pre-sorted files and prints out lines on which they differ - using the
-2 -3
flags causescomm
to only print lines contained in the first file but not the second, which will be the list of files that are safe to delete - the
tail +2
call is just to remove the heading of thecomm
output, which contains the name of the input file - Now we get a list of files to delete on standard out. We pipe this output to
xargs
which will turn the output stream into a list of arguments forrm
. The-p
option forcesxargs
to ask for confirmation before executing.
FWIW it looks like you can do this natively in zsh
, using the (+cmd)
glob qualifier.
To illustrate, let's start with some files
% ls
bar baz bazfoo keepfiles.txt foo kazoo
and a whitelist file
% cat keepfiles.txt
foo
kazoo
bar
First, read the whitelist into an array:
% keepfiles=( "${(f)$(< keepfiles.txt)}" )
or perhaps better
% zmodload zsh/mapfile
% keepfiles=( ${(f)mapfile[./keepfiles.txt]} )
(the equivalent of bash's mapfile
builtin - or its synonym readarray
). Now we can check whether a key (filename) exists in the array using ${keepfiles[(I)filename]}
which returns 0 if no match is found:
% print ${keepfiles[(I)foo]}
1
% print ${keepfiles[(I)baz]}
0
%
We can use this to make a function that returns true
if there are no matches for $REPLY
in the array:
% nokeep() { (( ${keepfiles[(I)$REPLY]} == 0 )); }
Finally, we use this function as a qualifier in our command:
% ls *(+nokeep)
baz bazfoo keepfiles.txt
or, in your case
% rm -- *(+nokeep)
(You'll likely want to add the name of the whitelist file itself to the whitelist.)
Unless the output of ls /home/me/myfolder2tocleanup/
exceeds the maximum shell argument limit ARG_MAX
which is around 2MB for Ubuntu, I would suggest the following.
A one line command implementation that will do the job, would be as follows:
- Copy the
dont-delete.txt
file to the directory containing the files to be deleted like so:
cp dont-delete.txt /home/me/myfolder2tocleanup/
-
cd
to the directory containing the files to be deleted like so:
cd /home/me/myfolder2tocleanup/
- Do a dry-run to test the command and make it print the names of the files that it detects as to be deleted without actually deleting them, like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs echo | tr " " "\n"
- If you are satisfied with the output, delete the files by running the command like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs rm
Explaination:
-
ls -p
will list all the files and directories in the current directory and the option-p
will add a/
to the directory names. -
grep -v /
will exclude directories by removing all items containing a/
in their names. -
sed 's/\<dont-delete.txt\>//g'
will exclude thedont-delete.txt
file, so it does not get deleted in the process. -
sort
will, just to make sure, sort the remaining output ofls
. -
comm -3 - <(sort dont-delete.txt)
will sort thedont-delete.txt
file, compare it to the sorted output ofls
and exclude filenames that exist in both. -
xargs rm
will remove all the remaining filenames in the already processed output ofls
. This means all the items in the current directory will be removed except for directories, files listed in thedont-delete.txt
file and thedont-delete.txt
file itself
In the dry-run part:
-
xargs echo
will print the files that should be removed. -
tr " " "\n"
will translate spaces into new lines for easier readability.
Notice:
In some cases parsing the output of ls
might be better avoided.