Comparing the contents of two directories
I have two directories that should contain the same files and have the same directory structure.
I think that something is missing in one of these directories.
Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?
You can use the diff
command just as you would use it for files:
diff <directory1> <directory2>
If you want to see subfolders and -files too, you can use the -r
option:
diff -r <directory1> <directory2>
A good way to do this comparison is to use find
with md5sum
, then a diff
.
Example
Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:
find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt
Do the same procedure to the another directory:
find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt
Then compare the result two files with diff
:
diff -u dir1.txt dir2.txt
Or as a single command using process substitution:
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)
If you want to see only the changes:
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")
The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.
But you won't know which file changed...
For that, you can try something like
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')
This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.
Another good way to do the job is using Git’s diff
command (may cause problems when files has different permissions -> every file is listed in output then):
git diff --no-index dir1/ dir2/