Use find to delete all images under given dimensions

I just used Recuva and Photorec to recover some data off an accidentally-formatted drive. Naturally, the result is every intact file that ever existed on the drive during its lifetime. This means tens of thousands of small icon images, both PNGs and JPGs, that I would like to delete - let's say anything under 100 x 100px.

There are solutions out there, but they're all fairly complicated and even the most simple rely on piping through several external programs and rm to do the actual deleting. This is not ideal because when doing something often enough in bash I always prefer a one-liner that I can remember and type out on the command line each time.

In an era of a much more fully-fledged GNU find that comes with -delete, is there really no way to do this entirely or even mostly within find itself?

EDIT: If find won't cut it, I would also be happy to use any other GNU tool.


Solution 1:

In an era of a much more fully-fledged GNU find that comes with -delete, is there really no way to do this entirely or even mostly within find itself?

find is not for reading image (meta)data (compare "DOTADIW"). To perform an arbitrary test, use -exec as a test (example) and then -delete. It may be like:

find . -type f -exec some_program -with -options -that -test -dimensions {} \; -delete

Iff some_program returns exit status 0 for a file then -delete will kick in for the file.

For more complicated tests you may need an inner shell:

find . -type f -exec sh -c 'shell-code "$1" | with-pipes && con-di-tio-nals -and -such' arbitrary-name {} \; -delete

Iff sh returns exit status 0 then -delete will kick in.

One big advantage: you can do this safely even if there are newlines, spaces or special characters in filenames. The code is robust.

One big disadvantage: -exec … \; will run one some_program per file. Or one sh, shell-code, with-pipes and con-di-tio-nals per file. Creating an additional process is costly, so this approach may not perform well.

To mitigate the disadvantage you may pass more filenames to the inner shell at once. This is what this answer does (the code has been debugged):

find . -iname "*.jpg" -type f -exec bash -c 'for i; do size=($(identify -format "%w %h" "$i")); (( size[0] < 300 || size[1] < 300 )) && rm -v "$i"; done' remove-files {} +

Note -exec … + here is not a test that would trigger -delete. Each shell process handles multiple files and returns a single exit status, so it's not a useful test for a single file. Instead, rm is conditionally called from within the shell.

Still there will be one identify per file and one rm per file-to-be-deleted. On the other hand there will be one bash per many files. For good performance you should strongly prefer shell builtins, shell arithmetic and shell syntax over external executables. This approach still processes filenames safely and firmly.


There are tools that can test many files with a single process. Example:

exiftool -q -r -if '$ImageHeight < 100' -if '$ImageWidth < 100' -p '$Directory/$FileName' .

(This is exiftool from the libimage-exiftool-perl package in Debian. Solution taken from this answer.)

Note this particular command is not limited to "PNGs and JPGs".

The command prints results like find . … -print would. You can then pipe to xargs to call rm. Common concerns about piping paths as text apply and I'm not sure one can make exiftool act like find . … -print0.

So while this solution may perform good when it comes to finding files and printing their paths, it's not the most robust way to actually delete them without human supervision.