rsync all pdfs except in certain directories?
I'm trying hard to understand the rsync filter system, and it's completely baffling me.
I have the following "test" directory structure to try to make sense of it. With no filter options here are all my files:
rsync -amv --dry-run /source /target
building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/excludedir/
source/excludedir/2.jpg
source/excludedir/4.pdf
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf
I just want to sync all *.pdf
files except in certain directories, namely any directory that has *exclude*
in it.
I'm using a file with the filter rules in it with the following command:
rsync -amv --dry-run --filter='merge /filter_rules' /source /target
The filter_rules look like variations on the following but I can't get them to produce the results I'm after:
-/ *exclude*/
+/ *.pdf
-/ *
The closest I've come is with the simple exclude:
-/ *exclude*/
Which yields:
building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf
How do I filter the rest to just get *.pdf
?
Solution 1:
For posterity, I did finally get this to work, and here are the instructions I wish I had had:
-
rsync
starts the filter process with a full list of files - the filter rules are handled IN ORDER (took me a while to get this)
- You may have all the right rules, but not the right order, so if you're using external exclude or include files, they may need to be re-ordered with a filter file which allows you to mix and match include/exclude rules, or listed on the cli itself
- for each file, The FIRST FILTER RULE THAT MATCHES puts the file into one of 2 buckets, include or exclude.
- Rules after the first matching rule are not applied!
- Each rule acts only on the files that made it "past" the previous rules not being matched
- Files that don't match any rules are INCLUDED
- The last rule is the most important and unintuitive, and it means exclude everything that wasn't specifically included UP TO THAT POINT.
So here's what ended up working:
-/ *exclude*/
+/ */
+/ *.pdf
-/ *
Originally I had those rules in separate include-from and exclude-from files, and that wouldn't allow for the proper order.