Given two directory trees, how can I find out which files differ by content? [closed]
If I want find the differences between two directory trees, I usually just execute:
diff -r dir1/ dir2/
This outputs exactly what the differences are between corresponding files. I'm interested in just getting a list of corresponding files whose content differs. I assumed that this would simply be a matter of passing a command line option to diff
, but I couldn't find anything on the man page.
Any suggestions?
Solution 1:
Try:
diff --brief --recursive dir1/ dir2/
Or alternatively, with the short flags -qr
:
diff -qr dir1/ dir2/
If you also want to see differences for files that may not exist in either directory:
diff --brief --recursive --new-file dir1/ dir2/ # with long options
diff -qrN dir1/ dir2/ # with short flag aliases
Solution 2:
The command I use is:
diff -qr dir1/ dir2/
It is exactly the same as Mark's :) But his answer bothered me as it uses different types of flags, and it made me look twice. Using Mark's more verbose flags it would be:
diff --brief --recursive dir1/ dir2/
I apologise for posting when the other answer is perfectly acceptable. Could not stop myself... working on being less pedantic.
Solution 3:
I like to use git diff --no-index dir1/ dir2/
, because it can show the differences in color (if you have that option set in your git config) and because it shows all of the differences in a long paged output using "less".
Solution 4:
Using rsync
:
rsync --dry-run --recursive --delete --links --checksum --verbose /dir1/ /dir2/ > dirdiff_2.txt
Alternatively, using diff
:
diff --brief --recursive --no-dereference --new-file --no-ignore-file-name-case /dir1 /dir2 > dirdiff_1.txt
They are functionally equivalent, but performance may vary depending on:
- If the directories are on the same drive, rsync is faster.
- If the directories reside on two separate drives, diff is faster.
This is because diff puts an almost equal load on both directories in parallel, maximizing load on the two drives. rsync calculates checksums in large chunks before actually comparing them. That groups the i/o operations in large chunks and leads to a more efficient processing when things take place on a single drive.