What tool(s) would you use to verify that a restored file structure is whole and complete? My environment is a Windows Server 2008 file server. (We use tape for backup, but that is inconsequential.)

I am specifically looking for a tool that will:

  • Record the names of all files and folders below a specified directory
  • Optionally calculate checksums of each file encountered
  • Save this index in a human-readable format
  • Compare the index against restored data and show differences

Some background: I recently had to replace the disks in our file server. The upgrade was scheduled to start 36 hours after the most recent full backup, so I created a differential backup. However, it turns out that one of our applications was clearing the archive bit on files saved to the server, so these were not included in the differential backup. I was unaware of this until my users reported some files as missing.

Aside from this, are there any other common methods for validating the integrity completeness of a restore? I am frequently told that testing backups by restoring them is the only way to know that backups are working, but how do you deal with the case where it works 99% correctly and the other 1% silently fails?


Update: Apparently I need to clarify a few things.

  • I do already use full backups when possible, but sometimes the situation calls for a differential backup. When that happens, I need to verify that every file in the original data is also in the restored data.
  • I am already using the "verify" feature in Backup Exec, but that only ensures that everything written to tape can be read back again.
  • I do conduct occasional spot-check restores to ensure that the backup media is intact.

I am already familiar with the common wisdom that "the best way to test a backup is to restore it." This is a necessary step, but it is NOT sufficient. Being able to restore the files you backed up does NOT guarantee that all the files you need were backed up in the first place. That is the problem I need solved.


Solution 1:

There are a variety of tools available on Linux which are well-suited to this task. You can use mount.cifs to mount Windows shared folders on a Linux host, or you could just run Cygwin right on the file server.

Before starting the backup, use the find command to recursively iterate from a specified directory and write the results to a file. This listing can be saved along with the backup for future use.

find /path/to/dir > list_before.txt

If you want to have checksums calculated for every file, just pass the output through md5. This command only shows filenames because folders don't need hashes.

find /path/to/dir -type f -print0 | xargs -0 md5 > md5_before.txt

After restoring the backup, build another file list using the same command, then use diff to find differences between them. Ideally, this command should give no output.

diff list_before.txt list_after.txt