rsync all pdfs except in certain directories?

I'm trying hard to understand the rsync filter system, and it's completely baffling me.

I have the following "test" directory structure to try to make sense of it. With no filter options here are all my files:

rsync -amv --dry-run /source /target

building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/excludedir/
source/excludedir/2.jpg
source/excludedir/4.pdf
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf

I just want to sync all *.pdf files except in certain directories, namely any directory that has *exclude* in it.

I'm using a file with the filter rules in it with the following command:

rsync -amv --dry-run --filter='merge /filter_rules' /source /target

The filter_rules look like variations on the following but I can't get them to produce the results I'm after:

-/ *exclude*/
+/ *.pdf
-/ *

The closest I've come is with the simple exclude:

-/ *exclude*/

Which yields:

building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf

How do I filter the rest to just get *.pdf ?


Solution 1:

For posterity, I did finally get this to work, and here are the instructions I wish I had had:

  • rsync starts the filter process with a full list of files
  • the filter rules are handled IN ORDER (took me a while to get this)
  • You may have all the right rules, but not the right order, so if you're using external exclude or include files, they may need to be re-ordered with a filter file which allows you to mix and match include/exclude rules, or listed on the cli itself
  • for each file, The FIRST FILTER RULE THAT MATCHES puts the file into one of 2 buckets, include or exclude.
  • Rules after the first matching rule are not applied!
  • Each rule acts only on the files that made it "past" the previous rules not being matched
  • Files that don't match any rules are INCLUDED
  • The last rule is the most important and unintuitive, and it means exclude everything that wasn't specifically included UP TO THAT POINT.

So here's what ended up working:

-/ *exclude*/
+/ */
+/ *.pdf
-/ *

Originally I had those rules in separate include-from and exclude-from files, and that wouldn't allow for the proper order.