Hash files in tar file

I have two *.tar files with similar contents. I want to verify which files are the same. A lot of the files are big so I comparing hashes would require extracting every file from each tar and computing the hash. Is there a way to hash files in a tar without having to extract it? Is there another way to compare files across two *.tar files?


If it's GNU tar, run this:

tar -xf file1.tar --to-command=file-stats-from-tar

where file-stats-from-tar is somewhere in $PATH and is:

#!/bin/bash

md5=`md5sum`;
md5=${md5%% *}

printf "%s\t%s\n" $md5 "$TAR_FILENAME"

Change md5sum if you need to.

This does it all in a single pass.

How it works is that the --to-command option tells tar to send each file separately to the command you specify, with a bunch of environment variables set (we only use TAR_FILENAME here).


There may be more efficient ways, but I was able to come up with this in a few moments:

tar tf test.tar | while read x ; do echo "$(tar xfO test.tar ${x} | md5sum) ${x}" ; done

The first tar tf just lists the files in the archive, which is then passed into the while read x bash loop. For each filename, it then finds the hash using tar xfO test.tar ${x} | md5sum You could obviously replace md5sum with your preferred hash tool. The weird use of echo $() ${x} is just to keep the output similar to a regular hash output with the values on the left and filenames on the right. Without that it just give you the hashes of all the files but no names, so you can't tell which went to which. Even with it there is a extra column of - in the output that isn't normally there. It could be easily removed with a colrm command in the pipeline.

This might not be the most efficient since it has to go through the tar file n+1 times if there are n files in it, but hopefully the tar contents are cached after the first read through.