Detecting blank image files
Solution 1:
Use the identify feature of ImageMagik CLI as given here:
http://www.imagemagick.org/script/identify.php
With command:
$ identify -format "%#" source.png
If the number of colors is 1, you have a blank page.
You can also use the command:
identify -verbose source.png
The standard deviation, skew and kurtosis will be 0 for a blank image.
Solution 2:
Slightly improved version of the code in the question:
#!/bin/bash
mkdir -p "blanks"
for i in "$@"; do
echo "${i}"
if [[ -e $(dirname "$i")/.$(basename "$i") ]]; then
echo " protected."
continue
fi
histogram=$(convert "${i}" -threshold 50% -format %c histogram:info:-)
#echo $histogram
white=$(echo "${histogram}" | grep "white" | cut -d: -f1)
black=$(echo "${histogram}" | grep "black" | cut -d: -f1)
if [[ -z "$black" ]]; then
black=0
fi
blank=$(echo "scale=4; ${black}/${white} < 0.005" | bc)
#echo $white $black $blank
if [ "${blank}" -eq "1" ]; then
echo "${i} seems to be blank - removing it..."
mv "${i}" "blanks/${i}"
fi
done
Changes:
- Pass the images to check as arguments instead of reading from a fixed location
- Progress report
- If the code doesn't detect a file correctly, you can give it a hint (create an empty file with the name of the image plus a dot in front, i.e. to protect
a.pnm
, usetouch .a.pnm
) - Fixed error when there were no black pixels in the input
Solution 3:
My trick is to scan the images to a losslessly compressed format (tiff + compression). This way, blank pages have a much lower file size and I can detect them with find
, move them to another directory, check them quickly with a viewer and then get rid of them.