How to check if a Unix .tar.gz file is a valid file without uncompressing?
I have found the question How to determine if data is valid tar file without a file?, but I was wondering: is there a ready made command line solution?
What about just getting a listing of the tarball and throw away the output, rather than decompressing the file?
tar -tzf my_tar.tar.gz >/dev/null
Edited as per comment. Thanks zrajm!
Edit as per comment. Thanks Frozen Flame! This test in no way implies integrity of the data. Because it was designed as a tape archival utility most implementations of tar will allow multiple copies of the same file!
you could probably use the gzip -t option to test the files integrity
http://linux.about.com/od/commands/l/blcmdl1_gzip.htm
from: http://unix.ittoolbox.com/groups/technical-functional/shellscript-l/how-to-test-file-integrity-of-targz-1138880
To test the gzip file is not corrupt:
gunzip -t file.tar.gz
To test the tar file inside is not corrupt:
gunzip -c file.tar.gz | tar -t > /dev/null
As part of the backup you could probably just run the latter command and check the value of $? afterwards for a 0 (success) value. If either the tar or the gzip has an issue, $? will have a non zero value.
If you want to do a real test extract of a tar file without extracting to disk, use the -O option. This spews the extract to standard output instead of the filesystem. If the tar file is corrupt, the process will abort with an error.
Example of failed tar ball test...
$ echo "this will not pass the test" > hello.tgz
$ tar -xvzf hello.tgz -O > /dev/null
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
$ rm hello.*
Working Example...
$ ls hello*
ls: hello*: No such file or directory
$ echo "hello1" > hello1.txt
$ echo "hello2" > hello2.txt
$ tar -cvzf hello.tgz hello[12].txt
hello1.txt
hello2.txt
$ rm hello[12].txt
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz -O
hello1.txt
hello1
hello2.txt
hello2
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz
hello1.txt
hello2.txt
$ ls hello*
hello1.txt hello2.txt hello.tgz
$ rm hello*
You can also check contents of *.tag.gz file using pigz
(parallel gzip) to speedup the archive check:
pigz -cvdp number_of_threads /[...]path[...]/archive_name.tar.gz | tar -tv > /dev/null
A nice option is to use tar -tvvf <filePath>
which adds a line that reports the kind of file.
Example in a valid .tar file:
> tar -tvvf filename.tar
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:46 ./testfolder2/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:46 ./testfolder2/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:46 ./testfolder2/.DS_Store
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:42 ./testfolder2/testfolder/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:42 ./testfolder2/testfolder/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:42 ./testfolder2/testfolder/.DS_Store
-rw-r--r-- 0 diegoreymendez staff 325377 Jul 5 09:50 ./testfolder2/testfolder/Scala.pages
Archive Format: POSIX ustar format, Compression: none
Corrupted .tar file:
> tar -tvvf corrupted.tar
tar: Unrecognized archive format
Archive Format: (null), Compression: none
tar: Error exit delayed from previous errors.