Find and search inside all compressed files
I'd like to scan my hard drive for all compressed file collections like zip, gzip, bzip, and others and have the content of those searched for certain file types (such as images). Anti-virus' do it, so I believe there should be a way.
The simplest approach would be to list the contents of the archive and look for files of the relevant extension. For example, with a zip
file:
$ zip -sf foo.zip | grep -iE '\.png$|\.jpg$'
file1.jpg
file1.png
file2.jpg
file2.png
The -sf
option tells zip
to list the files contained in an archive. Then, the grep
will look for a .png
or .jpg
that are at the end of the line ($
). The -E
enables extended regular expressions, so we can use |
as OR and the -i
makes the matching case insensitive.
However, each archive tool has a different command to list the contents. I've written a script that can deal with most of the more popular ones. If you save that script as list_compressed.sh
, you could then run:
list_compressed.sh | grep -iE '\.png$|\.jpg$|\.jpeg$|\.gif$|\.tif$|\.tiff$'
That would show you the most common image types. Note that this approach assumes that the file type can be determined by the file's extension. It will not find image files that don't have an extension and it will not recognize files with the wrong extension. There is no way to deal with that without actually extracting the files from the archive and running file
on each of them.
If you want to find all archives that contain image files on your hard drive, combine the above with find
:
find / -name '*.gz' -o -name '*.tgz' -o -name '*.zip' -print0 |
while IFS= read -r -d '' arch; do
list_compressed.sh "$arch" |
grep -qiE '\.png$|\.jpg$|\.jpeg$|\.gif$|\.tif$|\.tiff$' &&
echo "$arch contains image(s)"
done
The find command will search for all .gz
, .tgz
or .zip
files (you can add as many extensions as you like), those are then passed through my script. The -q
suppresses grep's normal output, nothing will be printed. The && echo
will print the archive's name only if the grep
was successful.
Not as advanced as terdon, but this will do:
Save the following code, in a folder where all your code resides in, as finda.sh
, or any other name as you like:
for file in *.*; do
if ( 7z l -slt "$file"> /tmp/$file.log); then
echo $file:; cat /tmp/$file.log | grep -iE 'Path*'> $file.log && cat $file.log
fi
done
Then in a dir were all of your archives are in, run it and this is the output:
./finda.sh
one.7z:
Path = one/abradabra.png
Path = one/birb.png
three.rar:
Path = three/blah.png
Path = three/qwa0g.jpg
two.zip:
Path = two/whut.png