rsync copy over only certain types of files using include option

I use the following bash script to copy only files of certain extension(in this case *.sh), however it still copies over all the files. what's wrong?

from=$1
to=$2

rsync -zarv  --include="*.sh" $from $to

Solution 1:

I think --include is used to include a subset of files that are otherwise excluded by --exclude, rather than including only those files. In other words: you have to think about include meaning don't exclude.

Try instead:

rsync -zarv  --include "*/" --exclude="*" --include="*.sh" "$from" "$to"

For rsync version 3.0.6 or higher, the order needs to be modified as follows (see comments):

rsync -zarv --include="*/" --include="*.sh" --exclude="*" "$from" "$to"

Adding the -m flag will avoid creating empty directory structures in the destination. Tested in version 3.1.2.

So if we only want *.sh files we have to exclude all files --exclude="*", include all directories --include="*/" and include all *.sh files --include="*.sh".

You can find some good examples in the section Include/Exclude Pattern Rules of the man page

Solution 2:

The answer by @chepner will copy all the sub-directories whether it contains files or not. If you need to exclude the sub-directories that don't contain the file and still retain the directory structure, use

rsync -zarv  --prune-empty-dirs --include "*/"  --include="*.sh" --exclude="*" "$from" "$to"

Solution 3:

Here's the important part from the man page:

As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped.

To summarize:

  • Not matching any pattern means a file will be copied!
  • The algorithm quits once any pattern matches

Also, something ending with a slash is matching directories (like find -type d would).

Let's pull apart this answer from above.

rsync -zarv  --prune-empty-dirs --include "*/"  --include="*.sh" --exclude="*" "$from" "$to"
  1. Don't skip any directories
  2. Don't skip any .sh files
  3. Skip everything
  4. (Implicitly, don't skip anything, but the rule above prevents the default rule from ever happening.)

Finally, the --prune-empty-directories keeps the first rule from making empty directories all over the place.

Solution 4:

One more addition: if you need to sync files by its extensions in one dir only (without of recursion) you should use a construction like this:

rsync -auzv --include './' --include '*.ext' --exclude '*' /source/dir/ /destination/dir/

Pay your attention to the dot in the first --include. --no-r does not work in this construction.

EDIT:

Thanks to gbyte.co for the valuable comment!

EDIT:

The -uzv flags are not related to this question directly, but I included them because I use them usually.

Solution 5:

Wrote this handy function and put in my bash scripts or ~/.bash_aliases. Tested sync'ing locally on Linux with bash and awk installed. It works

selrsync(){
# selective rsync to sync only certain filetypes;
# based on: https://stackoverflow.com/a/11111793/588867
# Example: selrsync 'tsv,csv' ./source ./target --dry-run
types="$1"; shift; #accepts comma separated list of types. Must be the first argument.
includes=$(echo $types| awk  -F',' \
    'BEGIN{OFS=" ";}
    {
    for (i = 1; i <= NF; i++ ) { if (length($i) > 0) $i="--include=*."$i; } print
    }')
restargs="$@"

echo Command: rsync -avz --prune-empty-dirs --include="*/" $includes --exclude="*" "$restargs"
eval rsync -avz --prune-empty-dirs --include="*/" "$includes" --exclude="*" $restargs
}

Advantages:

short handy and extensible when one wants to add more arguments (i.e. --dry-run).

Example:

selrsync 'tsv,csv' ./source ./target --dry-run