In bash, how do I compare two folders to ensure they contain the same sets of files?
Solution 1:
If rsync is a viable option, perhaps the --itemize-changes
(-i) and --dry-run
options would be of use:
rsync -zaic src_dir/ dest_dir/ --dry-run
-z compresses files during transfer, -a copies in archive mode and -c bases the file comparisons on checksums rather than date modified or size.
-i will list the individual files that are different and --dry-run means that no data will be transferred, just generating a list.
Solution 2:
You might do something not entirely unlike:
(cd some/where; ls -lR) > somewhere.txt
(cd else/where; ls -lR) > elsewhere.txt
diff somewhere.txt elsewhere.txt
I haven't tried this, it depends on file metadata (dates etc) being preserved (cp -p ...
) and on ls
sorting filenames in the same order (which it should).
Solution 3:
diff --recursive
(-r
) does catch file changes, even within in subdirectories.
You might rather want to use diff --unified --recursive
, however. It creates a unified diff, which displays changed lines prefixed with (+) for additon and (-) for removal. Conveniently, it also displays surrounding lines (i.e. context), so that you can figure out what's going on there.
Solution 4:
diff <(cd /first/path/ && find ./ | sort) <(cd /second/path/ && find ./ | sort)
This is similar to this other answer but:
- I'm using
find
to generate lists of objects (files, directories); it fits here better thanls
because its output contains only paths. -
sort
ensures the relative order of objects is preserved, regardless of in what order eachfind
lists them. - The
<(…)
syntax avoids temporary files inbash
. -
find
will be executed only if the correspondingcd
succeeds, thanks to the&&
operator. This will save you from runningfind
in current directory if there's a typo in any path.
Additional notes:
- Paths returned by
find
will be relative to directories wecd
to. Make sure/first/path/
and/second/path/
correspond to each other. - Empty output from
diff
indicates the two directories are identical; but remember… - … the command operates on paths only, it doesn't check if the contents or metadata match.
- Object names with unusual characters (e.g. with newlines) will break the logic.