Create archive from difference of two folders

Solution 1:

Using diff for finding files which don't exist is severe overkill; you are doing a lot of calculations to compare the contents of the files, where clearly all you care about is whether a file name exists or not.

Maybe try this instead.

tar zcf newfiles.tar.gz $(comm -13 <(cd A && find . -type f | sort) <(cd B && find . -type f | sort) | sed 's/^\./B/')

The find commands produce a listing of the file name hierarchies; comm -13 extracts the elements which are unique to the second input file (which here isn't really a file at all; we are using the shell's process substitution facility to provide the input) and the sed command adds the path into B back to the beginning.

Passing a command substitution $(...) as the argument to tar is problematic; if there are a lot of file names, you will run into "command line too long", and if your file names contain whitespace or other irregularities in them, the shell will mess them up. The standard solution is to use xargs but using xargs tar cf will overwrite the output file if xargs ends up calling tar more than once; though perhaps your tar has an option to read the file names from standard input.

Solution 2:

With find:

$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print
./c
./d

The idea is to use the exec action with a shell script that tests the existence of the current file in the other directory. There are a few subtleties:

  • The first argument of sh -c is the script to execute, the second (here _ but could be anything else) corresponds to the $0 positional parameter of the script and the third ({}) is the current file name as set by find and passed to the script as positional parameter $1.
  • The -print action at the end is needed, even if it is normally the default with find, because the use of -exec cancels this default.

Example of use to generate your tarball with GNU tar:

$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print > ../list.txt
$ tar -c -v -f ../diff.tar --files-from=../list.txt
./c
./d

Note: if you have unusual file names the --verbatim-files-from GNU tar option can help. Or a combination of the -print0 action of find and the --null option of GNU tar.

Note: if the shell is POSIX (e.g., bash) you can also run find from the parent directory and get the path of the files relative from there, if you prefer:

$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ find B -type f -exec sh -c '[ ! -f A"${1#B}" ]' _ {} \; -print
B/c
B/d