In bash, how do I compare two folders to ensure they contain the same sets of files?

Solution 1:

If rsync is a viable option, perhaps the --itemize-changes (-i) and --dry-run options would be of use:

rsync -zaic src_dir/ dest_dir/ --dry-run

-z compresses files during transfer, -a copies in archive mode and -c bases the file comparisons on checksums rather than date modified or size.

-i will list the individual files that are different and --dry-run means that no data will be transferred, just generating a list.

Solution 2:

You might do something not entirely unlike:

(cd some/where; ls -lR) > somewhere.txt
(cd else/where; ls -lR) > elsewhere.txt
diff somewhere.txt elsewhere.txt

I haven't tried this, it depends on file metadata (dates etc) being preserved (cp -p ...) and on ls sorting filenames in the same order (which it should).

Solution 3:

diff --recursive (-r) does catch file changes, even within in subdirectories.

You might rather want to use diff --unified --recursive, however. It creates a unified diff, which displays changed lines prefixed with (+) for additon and (-) for removal. Conveniently, it also displays surrounding lines (i.e. context), so that you can figure out what's going on there.

Solution 4:

diff <(cd /first/path/ && find ./ | sort) <(cd /second/path/ && find ./ | sort)

This is similar to this other answer but:

  • I'm using find to generate lists of objects (files, directories); it fits here better than ls because its output contains only paths.
  • sort ensures the relative order of objects is preserved, regardless of in what order each find lists them.
  • The <(…) syntax avoids temporary files in bash.
  • find will be executed only if the corresponding cd succeeds, thanks to the && operator. This will save you from running find in current directory if there's a typo in any path.

Additional notes:

  • Paths returned by find will be relative to directories we cd to. Make sure /first/path/ and /second/path/ correspond to each other.
  • Empty output from diff indicates the two directories are identical; but remember…
  • … the command operates on paths only, it doesn't check if the contents or metadata match.
  • Object names with unusual characters (e.g. with newlines) will break the logic.