rsync and include / exclude. How hard can it be?

I'm trying to recursively copy a directory / file structure from one directory to another, keeping only html files. Should be a simple case of include / exclude shouldn't it?

I just want to print out the files first. When I get that right, I'll copy them.

rsync -a --list-only -v SOURCEDIR --exclude='.*' --include='**/*.html' 

Gives me all the files.

rsync -a --list-only -v SOURCEDIR --include='**/*.html' --exclude='*' 

and

rsync -a --list-only -v SOURCEDIR --include='*.html' --exclude='*' 
rsync -a --list-only -v SOURCEDIR --include=*.html --exclude=*

Give me no files.

rsync -a --list-only -v SOURCEDIR --include='*.html' --exclude='*.*'

Looks like it gives me the whole directory structure and only html files. But I don't want empty directories.

Help!

On Mac OS 10.6


Solution 1:

Have you considered using find to do your hard work?

Something along the lines of

find ./ -name "*.html" -exec rsync -R {} /target/base/directory/ \; 

will recreate the directory tree of ./ in which html files are found, and build the same under /target/base/directory

Solution 2:

Rsync can be confusing about selective copies like this. I use the following to do the task that you're asking for:

rsync -avP \
--filter='+ */' \
--filter='+ **/*.html' \
--filter='- *' \
--prune-empty-dirs \
--delete \
/source/ \
/dest/

Basically you need to include all directories in the search, then add all *.html files to the list, the exclude all other files.

The --prune-empty-dirs option is handy to use as it excludes any directory that doesn't have a *.html file.