Create archive from difference of two folders
Solution 1:
Using diff
for finding files which don't exist is severe overkill; you are doing a lot of calculations to compare the contents of the files, where clearly all you care about is whether a file name exists or not.
Maybe try this instead.
tar zcf newfiles.tar.gz $(comm -13 <(cd A && find . -type f | sort) <(cd B && find . -type f | sort) | sed 's/^\./B/')
The find
commands produce a listing of the file name hierarchies; comm -13
extracts the elements which are unique to the second input file (which here isn't really a file at all; we are using the shell's process substitution facility to provide the input) and the sed
command adds the path into B back to the beginning.
Passing a command substitution $(...)
as the argument to tar
is problematic; if there are a lot of file names, you will run into "command line too long", and if your file names contain whitespace or other irregularities in them, the shell will mess them up. The standard solution is to use xargs
but using xargs tar cf
will overwrite the output file if xargs
ends up calling tar
more than once; though perhaps your tar
has an option to read the file names from standard input.
Solution 2:
With find
:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print
./c
./d
The idea is to use the exec
action with a shell script that tests the existence of the current file in the other directory. There are a few subtleties:
- The first argument of
sh -c
is the script to execute, the second (here_
but could be anything else) corresponds to the$0
positional parameter of the script and the third ({}
) is the current file name as set byfind
and passed to the script as positional parameter$1
. - The
-print
action at the end is needed, even if it is normally the default withfind
, because the use of-exec
cancels this default.
Example of use to generate your tarball with GNU tar
:
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print > ../list.txt
$ tar -c -v -f ../diff.tar --files-from=../list.txt
./c
./d
Note: if you have unusual file names the --verbatim-files-from
GNU tar
option can help. Or a combination of the -print0
action of find
and the --null
option of GNU tar
.
Note: if the shell is POSIX (e.g., bash
) you can also run find
from the parent directory and get the path of the files relative from there, if you prefer:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ find B -type f -exec sh -c '[ ! -f A"${1#B}" ]' _ {} \; -print
B/c
B/d