remove file but exclude all files in a list

The rm command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.

The check directory section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.

You can remove the echo deleting line to run silently.

#!/bin/bash

cd /home/me/myfolder2tocleanup/

# Exit if the directory isn't found.
if (($?>0)); then
    echo "Can't find work dir... exiting"
    exit
fi

for i in *; do
    if ! grep -qxFe "$i" filelist.txt; then
        echo "Deleting: $i"
        # the next line is commented out.  Test it.  Then uncomment to removed the files
        # rm "$i"
    fi
done

This python script can do this:

#!/usr/bin/env python3
import os
no_remove = set()
with open('./dont-delete.txt') as f:
     for line in f:
         no_remove.add(line.strip())

for f in os.listdir('.'):
    if f not in no_remove:
        print('unlink:' + f ) 
        #os.unlink(f)

Important part is to uncomment the os.unlink() function.

NOTE: add this script and dont-delete.txt to your dont-delete.txt so that they both are on the list, and keep them in the same directory.


Here's a one-liner:

comm -2 -3 <(ls) <(sort dont_delete) | tail +2 | xargs -p rm
  1. ls prints all files in the current directory (in sorted order)
  2. sort dont_delete prints all the files we don't want to delete in sorted order
  3. the <() operator turns a string into a file-like object
  4. The comm commands compares two pre-sorted files and prints out lines on which they differ
  5. using the -2 -3 flags causes comm to only print lines contained in the first file but not the second, which will be the list of files that are safe to delete
  6. the tail +2 call is just to remove the heading of the comm output, which contains the name of the input file
  7. Now we get a list of files to delete on standard out. We pipe this output to xargs which will turn the output stream into a list of arguments for rm. The -p option forces xargs to ask for confirmation before executing.

FWIW it looks like you can do this natively in zsh, using the (+cmd) glob qualifier.

To illustrate, let's start with some files

 % ls
bar  baz  bazfoo  keepfiles.txt  foo  kazoo

and a whitelist file

 % cat keepfiles.txt
foo
kazoo
bar

First, read the whitelist into an array:

 % keepfiles=( "${(f)$(< keepfiles.txt)}" )

or perhaps better

 % zmodload zsh/mapfile
 % keepfiles=( ${(f)mapfile[./keepfiles.txt]} )

(the equivalent of bash's mapfile builtin - or its synonym readarray). Now we can check whether a key (filename) exists in the array using ${keepfiles[(I)filename]} which returns 0 if no match is found:

 % print ${keepfiles[(I)foo]}
1
 % print ${keepfiles[(I)baz]}
0
 %

We can use this to make a function that returns true if there are no matches for $REPLY in the array:

% nokeep() { (( ${keepfiles[(I)$REPLY]} == 0 )); }

Finally, we use this function as a qualifier in our command:

 % ls *(+nokeep)
baz  bazfoo  keepfiles.txt

or, in your case

 % rm -- *(+nokeep)

(You'll likely want to add the name of the whitelist file itself to the whitelist.)


Unless the output of ls /home/me/myfolder2tocleanup/ exceeds the maximum shell argument limit ARG_MAX which is around 2MB for Ubuntu, I would suggest the following.


A one line command implementation that will do the job, would be as follows:

  1. Copy the dont-delete.txt file to the directory containing the files to be deleted like so:
cp dont-delete.txt /home/me/myfolder2tocleanup/
  1. cd to the directory containing the files to be deleted like so:
cd /home/me/myfolder2tocleanup/
  1. Do a dry-run to test the command and make it print the names of the files that it detects as to be deleted without actually deleting them, like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs echo | tr " " "\n"
  1. If you are satisfied with the output, delete the files by running the command like so:
ls -p | grep -v / | sed 's/\<dont-delete.txt\>//g' | sort | comm -3 - <(sort dont-delete.txt) | xargs rm

Explaination:

  • ls -p will list all the files and directories in the current directory and the option -p will add a / to the directory names.
  • grep -v / will exclude directories by removing all items containing a / in their names.
  • sed 's/\<dont-delete.txt\>//g'will exclude the dont-delete.txt file, so it does not get deleted in the process.
  • sort will, just to make sure, sort the remaining output of ls.
  • comm -3 - <(sort dont-delete.txt) will sort the dont-delete.txt file, compare it to the sorted output of ls and exclude filenames that exist in both.
  • xargs rm will remove all the remaining filenames in the already processed output of ls. This means all the items in the current directory will be removed except for directories, files listed in the dont-delete.txt file and the dont-delete.txt file itself

In the dry-run part:

  • xargs echo will print the files that should be removed.
  • tr " " "\n" will translate spaces into new lines for easier readability.

Notice:

In some cases parsing the output of ls might be better avoided.