rsync exclude according to .gitignore & .hgignore & svn:ignore like --filter=:C

Rsync includes a nifty option --cvs-exclude to “ignore files in the same way CVS does”, but CVS has been obsolete for years. Is there any way to make it also exclude files which would be ignored by modern version control systems (Git, Mercurial, Subversion)?

For example, I have lots of Maven projects checked out from GitHub. Typically they include a .gitignore listing at least target, the default Maven build directory (which may be present at top level or in submodules). Since the contents of these directories are entirely disposable, and they can be far larger than source code, I would like to exclude them when using rsync for backups.

Of course I can explicitly --exclude=target/ but that will accidentally suppress unrelated directories that just happen to be named target and are not supposed to be ignored.

And I could supply a complete list of absolute paths for all file names and patterns mentioned in any .gitignore, .hgignore, or svn:ignore property on my disk, but this would be a huge list that would have to be produced by some sort of script.

Since rsync has no built-in support for VCS checkouts other than CVS, is there any good trick for feeding it their ignore patterns? Or some kind of callback system whereby a user script can be asked whether a given file/directory should be included or not?

Update: --filter=':- .gitignore' as suggested by LordJavac seems to work as well for Git as --filter=:C does for CVS, at least on the examples I have found, though it is unclear if the syntax is an exact match. --filter=':- .hgignore' does not work very well for Mercurial; e.g. an .hgignore containing a line like ^target$ (the Mercurial equivalent of Git /target/) is not recognized by rsync as a regular expression. And nothing seems to work for Subversion, for which you would have to parse .svn/dir-prop-base for a 1.6 or earlier working copy, and throw up your hands in dismay for a 1.7 or later working copy.


Solution 1:

As mentioned by luksan, you can do this with the --filter switch to rsync. I achieved this with --filter=':- .gitignore' (there's a space before ".gitignore") which tells rsync to do a directory merge with .gitignore files and have them exclude per git's rules. You may also want to add your global ignore file, if you have one. To make it easier to use, I created an alias to rsync which included the filter.

Solution 2:

You can use git ls-files to build the list of files excluded by the repository's .gitignore files. https://git-scm.com/docs/git-ls-files

Options:

  • --exclude-standard Consider all .gitignore files.
  • -o Don't ignore unstaged changes.
  • -i Only output ignored files.
  • --directory Only output the directory path if the entire directory is ignored.

The only thing I left to ignore was .git.

rsync -azP --exclude=.git --exclude=`git -C <SRC> ls-files --exclude-standard -oi --directory` <SRC> <DEST>