Excluding certain files and directories when deleting files

My top-level directory is data. data includes several directories and these directories have sub-directories. I need to remove all files and directories inside data/ except several files in some directories.

For example, data includes the directories 100 and 101. I just want to keep a.txt and b.txt files in 100/ and c.txt and d.txt files in 101/ while removing all other files and directories in 100 and 101.

Example:

.
├── 100
│   ├── a.txt
│   ├── b.txt
│   ├── c.txt
│   └── d.txt
└── 101
    ├── a.txt
    ├── b.txt
    ├── c.txt
    └── d.txt

I use rm -rf !(a.txt|b.txt) command but I can't apply this command for each directory automatically.


Solution 1:

As you already found out you can use the extglob feature which is enabled with:

shopt -s extglob

This allows to exclude matches so that you can do things like:

rm 100/!([ab].txt) 101/!([cd].txt)

It's a good idea to test it with echo first. This example will match anything inside 100/ which is not a.txt or b.txt and anything inside 101/ which is not c.txt or d.txt. If the same rules for 100/ apply to 102/ as well you can do e.g.:

10[02]/!([ab].txt) # or
{100,102}/!([ab].txt)

Solution 2:

You could use find for this. You can negate tests in find with -not or !. This will exclude matches, rather than finding them.

You should be careful not to delete any of the parent directories of the files you want to keep, especially the current directory ., so make sure you read the output thoroughly before deleting.

Based on your example you could do something like this from the data directory.

find ! -path . ! -path ./100 ! -path ./101 ! -path "./100/[ab].txt" ! -path "./101/[cd].txt"

Add a ! -path ./path/to/dir for any path you want to avoid deleting. You can use metacharacters like *, but make sure you quote the expression if you do, eg "./path*dir", to prevent any unwanted expansions.

find is recursive by default. Even if we don't find ./100 here, we will find all of its contents unless they match the pattern [ab].txt. If you can't match all the names you want to keep, add another test:

! -path "./100/[ab].txt" ! -path ./100/foo

This won't find a.txt or b.txt or foo, but it will find all other files.

When you are sure you see what you want, you can add -delete to the end to delete the found files:

find ! -path . ! -path ./100 ! -path ./101 ! -path "./100/[ab].txt" ! -path "./101/[cd].txt" -delete