Check the total content size of a tar gz file
How can I extract the size of the total uncompressed file data in a .tar.gz file from command line?
This works for any file size:
zcat archive.tar.gz | wc -c
For files smaller than 4Gb you could also use the -l option with gzip:
$ gzip -l compressed.tar.gz
compressed uncompressed ratio uncompressed_name
132 10240 99.1% compressed.tar
This will sum the total content size of the extracted files:
$ tar tzvf archive.tar.gz | sed 's/ \+/ /g' | cut -f3 -d' ' | sed '2,$s/^/+ /' | paste -sd' ' | bc
The output is given in bytes.
Explanation: tar tzvf
lists the files in the archive in verbose format like ls -l
. sed
and cut
isolate the file size field. The second sed
puts a + in front of every size except the first and paste
concatenates them, giving a sum expression that is then evaluated by bc
.
Note that this doesn't include metadata, so the disk space taken up by the files when you extract them is going to be larger - potentially many times larger if you have a lot of very small files.
The command gzip -l archive.tar.gz
doesn't work correctly with file sizes greater than 2Gb. I would recommend zcat archive.tar.gz | wc --bytes
instead for really large files.
I know this is an old answer; but I wrote a tool just for this two years ago. It’s called gzsize
and it gives you the uncompressed size of a gzip'ed file without actually decompressing the whole file on disk:
$ gzsize <your file>
Use the following command:
tar -xzf archive.tar.gz --to-stdout|wc -c