Recursively search files with exclusions and inclusions
Solution 1:
tl;dr
Something similar to
find /local/data/ \
! -path '/local/data/database/session*' \
-o -path '/local/data/database/session_*.db'
Preamble
There are no simple --include
and --exclude
directives in the implementations of find
I know. In any case you can build a sequence of tests that will work as you wish, because the mechanism of tests in find
is deliberately designed to allow any (even a custom) test based on any criteria (i.e. not necessarily on the pathname). To do what you want you need to translate your exclude/include patterns to a sequence of tests. To do this properly you need to know how find
works. Its mechanism is more general than the concept of excluding/including.
Here I will rely mostly on the POSIX specification for find
(all citations are from this document). Implementations that go beyond this specification expand the tool without changing its general philosophy.
Theory
To understand and effectively use find
you need to know few things:
-
Terminology:
- There are few possible options (like
-L
) that may appear just afterfind
. For the purpose of this answer they are not important. - Then there is one or more starting points.
/local/data/
in your example is a starting point. Some implementations allow zero starting points (then.
or./
is the default starting point). - Everything that follows forms an expression. The expression consists of zero or more supported operands: primaries like
-name
,-exec
; operators like-o
,(
(which often should be escaped or quoted to protect it from the shell) or!
. Some of them require custom additional operands (e.g. patterns) that also belong to the expression.
- There are few possible options (like
-
Almost everything in the expression is a test. The manual for GNU
find
in my Ubuntu divides supported operands into categories: tests, actions etc. Still most of them can be treated as tests; i.e. any primary returns either true or false, which affects whatfind
does next. In this answer I use the word "test" in a very broad sense. -
find
starts from the specified starting point and recursively descends the directory hierarchy in a certain sequence. Some operands can alter the sequence (-depth
) or even reduce it (-prune
). -
find
evaluates the expression for each file separately. -
find
evaluates the expression from left to right. The tool may rearrange tests if this maneuver does not affect the overall output (not only output to stdout, note-exec
can do anything), some implementations do this for performance; even then the expression should work as if it was evaluated from left to right. Some operands work regardless of their position in the expression though (-depth
,-xdev
). -
For a given file some part(s) of the expression may not be evaluated at all. Operators
-a
,-o
,(
+)
,!
define the logic of the expression.The primaries can be combined using the following operators (in order of decreasing precedence):
( expression )
True ifexpression
is true.! expression
Negation of a primary; the unary NOT operator.expression [-a] expression
Conjunction of primaries; the AND operator is implied by the juxtaposition of two primaries or made explicit by the optional-a
operator. The second expression shall not be evaluated if the first expression is false.expression -o expression
Alternation of primaries; the OR operator. The second expression shall not be evaluated if the first expression is true.Imagine
-test1
,-test2
and-test3
are testsfind
understands. Let the expression be! -test1 -test2 -o -test3
which is equivalent to
( ( ! -test1 ) -a -test2 ) -o -test3
In a shell the full commands would be respectively:
find /starting/point ! -test1 -test2 -o -test3 find /starting/point \( \( ! -test1 \) -a -test2 \) -o -test3
Possible outcomes:
-
-test1
is evaluated for every file tested.- If
-test1
is false,( ! -test1 )
is true. Then-test2
is evaluated because this is how-a
works.- If
-test2
is false, the expression in the outer parentheses is false. Then-test3
is evaluated because this is how-o
works.- If
-test3
is false, the entire expression is false. - If
-test3
is true, the entire expression is true.
- If
- If
-test2
is true, the expression in the outer parentheses is true. Then-test3
is not evaluated because this is how-o
works. The entire expression is true.
- If
- If
-test1
is true,( ! -test1 )
is false. Then-test2
is not evaluated because this is how-a
works. The expression in the outer parentheses is false. Then-test3
is evaluated because this is how-o
works.- If
-test3
is false, the entire expression is false. - If
-test3
is true, the entire expression is true.
- If
- If
Note that logically
( ( NOT A ) AND B ) OR C
is equivalent toC OR ( B AND ( NOT A ) )
, but withfind
the following expressions are not equivalent, in general they are pairwise different:! -test1 -test2 -o -test3 -test2 ! -test1 -o -test3 -test3 -o ! -test1 -test2 -test3 -o -test2 ! -test1
This is especially true if one or more tests are
-exec
. Often-exec
is used to conditionally do something (example), so it will be after other tests (conditions) and we will rather say it's an action, not a test. But you can write a custom test with-exec
(example) and this is very powerful; in such case-exec
may be even the first test, the one that is always evaluated. Not only the logical outcome (true or false) from-exec
makesfind
perform or skip later tests for the file. What-exec
does (e.g. imagine it removes some accompanying files) can affect later tests (for the same file or even for other files), possibly in a non-obvious way. -
-
Parentheses are important. Problems where
-o
seems to misbehave are often solved by using parentheses (example). -
In some circumstances
-print
is implicitly added:If no expression is present,
-print
shall be used as the expression. Otherwise, if the given expression does not contain any of the primaries-exec
,-ok
, or-print
, the given expression shall be effectively replaced by:( given_expression ) -print
Notes
- In this case
-print
will be evaluated (performed) iff the given expression evaluates to true. Above, where I wrote "the entire expression is false" or "the entire expression is true", I meant what matters for the implicit-print
(if applicable). - Implementations may expand the set "
-exec
,-ok
,-print
" with other (non-POSIX) primaries.
- In this case
Solution
The question is about exclusions/inclusions based on pathnames. The following primaries are useful:
-name pattern
The primary shall evaluate as true if the basename of the current pathname matchespattern
using the pattern matching notation […]
-path pattern
The primary shall evaluate as true if the current pathname matchespattern
using the pattern matching notation […]
-prune
The primary shall always evaluate as true; it shall causefind
not to descend the current pathname if it is a directory. If the-depth
primary is specified, the -prune primary shall have no effect.
(Terms like "basename" or "pathname" are defined here.)
Implementations may add other useful primaries (e.g. -regex
, -iname
).
Often -prune
is the right way to exclude the content of the given directory (with or without the directory itself). But it totally prevents find
from entering the directory; so if you want to find (include) some files in the directory anyway, then you cannot use -prune
.
I think you want this:
- Print pathname of each file in the directory hierarchy starting from
/local/data/
, -
but don't if it matches
/local/data/database/session*
, -
but do if it matches
/local/data/database/session_*.db
.
The following find
command should do it:
find /local/data/ \
! -path '/local/data/database/session*' \
-o -path '/local/data/database/session_*.db'
where \
before a newline tells the shell the command continues in the next line. Quoting is important (you probably know, you quoted in the question).
It works like this:
- For each file under (and including) the starting point but not matching the exclusion pattern,
! -path …
is true; the second test is not performed and the entire expression is true. - For each file under (and including) the starting point and matching the exclusion pattern,
! -path …
is false; only then the second test is performed.- If the second test is true, the entire expression is true.
- If the second test if false, the entire expression is false.
Notes:
- This is a case where the implicit
-print
is added. - These tests in the reverse order would work as well.
General case
With parentheses, -a
, -o
and !
you can create quite complex exclude+include schemes. In particular:
- nested (e.g. exclude
./foo/*
, but include./foo/bar/*
, but exclude./foo/bar/baz/*
, but …); - based on criteria other than pathnames (e.g. totally exclude directories owned by root).
Although it may not be easy to create expressions implementing complex schemes flawlessly.
Pitfalls
-
Metacharacters (e.g.
*
) in patterns do not treat/
or.
specially. The fragmentsession_*.db
matchessession_5.db
, it also matchessession_foo/bar/baz.db
. -
In cases when you can use
-prune
, remember-prune
evaluates as true. With implicit-print
this may surprise you. That's why I wrote "-prune
is the right way to exclude the content of the given directory (with or without the directory itself)". -
In cases when you can use
-prune
, make sure it gets evaluated when you need it.Example:
mkdir -p test/ab/a; cd test find . -name 'a*' -print -o -name '*b' -prune #1 find . -name '*b' -prune -o -name 'a*' -print #2 find . -name '*b' -prune -print -o -name 'a*' -print #3 find . \( -name '*b' -prune -o -name 'a*' \) -print #4 find . -name '*b' -prune -o -name 'a*' #5
In the first case the directory named
ab
will be printed and not pruned. In the second case it will be pruned and not printed. In the third case it will be pruned and printed once. The fourth case is equivalent to the third,-print
has been placed behind the parentheses (like a common factor in math). The fifth case is equivalent to the fourth,-print
is implicit.The first case is an example of a more general problem (bug), where some file (here
ab
directory) never reaches the test designed for it and the right action, because it accidentally matches an earlier test designed with other files in mind, and triggers an unwanted action. -
Pathnames used by
-path
are whatfind
"thinks" they are, not whatrealpath
would print. Patterns must take this into account.Example:
cd /bin && find . -path '/bin*' # will find nothing cd /bin && find . -path '.*' # will find "everything" cd /bin && find /bin -path '/bin*' # will find "everything" cd /bin && find /bin -path '.*' # will find nothing
Similarly for a starting point the basename used by
-name
depends on the exact representation of the starting point. Edge cases, but still:-
/
for/
,///
,////
etc. -
.
for.
,./
,/.
,/bin/.
,/bin/../.
etc. -
..
for..
,/..
,/../../
,///bin/..
etc.
-
-
Each starting point defines a separate hierarchy. The tool doesn't care if the hierarchies overlap.
Example: if
/bin/bash
and/bin/dash
exist, the following command will findbash
four times (with three different pathnames) anddash
three times (with two different pathnames):cd /bin && find . /bin /bin ../bin/bash -name '[bd]ash'