Check the correctness of copied files

Solution 1:

I'm using hashdeep to verify backups/restores and occasionally to check for file system corruption in a RAID.

The speed depends on which hash functions you use (some are more CPU intensive than others) as well as the read speed of your disks. On my system hashdeep can process or verify around 1 TB/hour with md5 and 300 MB/s read speed.


Example on calculating checksums and storing them in a file:

cd my-data
hashdeep -rlc md5 . > ~/checksums.txt

Parameters:

  • r – recursive
  • l – use relative paths
  • c – specify hash function
  • . – recursive starting at the current directory
  • > – redirect output to the specified file

See the man page.


Example on verifying checksums and printing a list of differences:

$ cd /mnt/my-backup
$ hashdeep -ravvl -k ~/checksums.txt .
hashdeep: Audit passed
          Files matched: 40914
Files partially matched: 0
            Files moved: 0
        New files found: 0
  Known files not found: 0

Parameters:

  • a – audit (compare with the list of known checksums)
  • v – verbose (to get a listing of mismatches, multiple vs means more verbose)
  • k – file of known hashes

Note that as of March 2016 hashdeep appears to be abandoned.

Solution 2:

It looks like the perfect task for rsync. Rsync is comparing and copying diffs.

The rsync utility first popped into my mind when I saw your question. Doing something like below could quickly show what files are in directory a but not in b:

$ rsync -rcnv a/* b/

-r will recurse into the directories
-c will compare based on file checksum
-n will run it as a "dry run" and make no changes, but just print out the files 
   that would be updated
-v will print the output to stdout verbosely

This is a good option because you can compare the contents of the files as well to make sure they match. rsync's delta algorithm is optimized for this type of use case. Then if you want to make b match the contents of a, you can just remove the -n option to perform the actual sync.

Some related questions:

  • https://stackoverflow.com/questions/19396718/compare-files-in-two-directory-on-remote-server-using-unix
  • https://unix.stackexchange.com/questions/57305/rsync-compare-directories

Solution 3:

If the GUI apps suggested over at File and directory comparison tool? don't do it for you, try diff -rq /path/to/one /path/to/other to recurse through both directories quietly, logging only differences to the screen.

Solution 4:

The situation you are saying is too complex. Though you can write a script to calculate MD5 of all the files you want to copy and later on compare them with the ones copied:

  • http://dll.nu/md5i/
  • http://www.unix.com/unix-desktop-dummies-questions-answers/156854-script-compare-md5.html

If you want something simple and fast (it will not work in very complex scenarios) you can use Meld

sudo apt-get install meld