Check the correctness of copied files
Solution 1:
I'm using hashdeep to verify backups/restores and occasionally to check for file system corruption in a RAID.
The speed depends on which hash functions you use (some are more CPU intensive than others) as well as the read speed of your disks. On my system hashdeep
can process or verify around 1 TB/hour with md5 and 300 MB/s read speed.
Example on calculating checksums and storing them in a file:
cd my-data
hashdeep -rlc md5 . > ~/checksums.txt
Parameters:
-
r
– recursive -
l
– use relative paths -
c
– specify hash function -
.
– recursive starting at the current directory -
>
– redirect output to the specified file
See the man page.
Example on verifying checksums and printing a list of differences:
$ cd /mnt/my-backup
$ hashdeep -ravvl -k ~/checksums.txt .
hashdeep: Audit passed
Files matched: 40914
Files partially matched: 0
Files moved: 0
New files found: 0
Known files not found: 0
Parameters:
-
a
– audit (compare with the list of known checksums) -
v
– verbose (to get a listing of mismatches, multiplev
s means more verbose) -
k
– file of known hashes
Note that as of March 2016 hashdeep
appears to be abandoned.
Solution 2:
It looks like the perfect task for rsync. Rsync is comparing and copying diffs.
The rsync
utility first popped into my mind when I saw your question. Doing something like below could quickly show what files are in directory a
but not in b
:
$ rsync -rcnv a/* b/
-r will recurse into the directories
-c will compare based on file checksum
-n will run it as a "dry run" and make no changes, but just print out the files
that would be updated
-v will print the output to stdout verbosely
This is a good option because you can compare the contents of the files as well to make sure they match. rsync
's delta algorithm is optimized for this type of use case. Then if you want to make b
match the contents of a
, you can just remove the -n
option to perform the actual sync.
Some related questions:
- https://stackoverflow.com/questions/19396718/compare-files-in-two-directory-on-remote-server-using-unix
- https://unix.stackexchange.com/questions/57305/rsync-compare-directories
Solution 3:
If the GUI apps suggested over at File and directory comparison tool? don't do it for you, try diff -rq /path/to/one /path/to/other
to recurse through both directories quietly, logging only differences to the screen.
Solution 4:
The situation you are saying is too complex. Though you can write a script to calculate MD5 of all the files you want to copy and later on compare them with the ones copied:
- http://dll.nu/md5i/
- http://www.unix.com/unix-desktop-dummies-questions-answers/156854-script-compare-md5.html
If you want something simple and fast (it will not work in very complex scenarios) you can use Meld
sudo apt-get install meld