Solution 1:

Use the identify feature of ImageMagik CLI as given here:

http://www.imagemagick.org/script/identify.php

With command:

$ identify -format "%#" source.png

If the number of colors is 1, you have a blank page.

You can also use the command:

identify -verbose source.png

The standard deviation, skew and kurtosis will be 0 for a blank image.

Solution 2:

Slightly improved version of the code in the question:

#!/bin/bash

mkdir -p "blanks"

for i in "$@"; do
    echo "${i}"
    if [[ -e $(dirname "$i")/.$(basename "$i") ]]; then
        echo "   protected."
        continue
    fi

    histogram=$(convert "${i}" -threshold 50% -format %c histogram:info:-)
    #echo $histogram
    white=$(echo "${histogram}" | grep "white" | cut -d: -f1)
    black=$(echo "${histogram}" | grep "black" | cut -d: -f1)
    if [[ -z "$black" ]]; then
        black=0
    fi

    blank=$(echo "scale=4; ${black}/${white} < 0.005" | bc)
    #echo $white $black $blank
    if [ "${blank}" -eq "1" ]; then
        echo "${i} seems to be blank - removing it..."
        mv "${i}" "blanks/${i}"
    fi
done

Changes:

  • Pass the images to check as arguments instead of reading from a fixed location
  • Progress report
  • If the code doesn't detect a file correctly, you can give it a hint (create an empty file with the name of the image plus a dot in front, i.e. to protect a.pnm, use touch .a.pnm)
  • Fixed error when there were no black pixels in the input

Solution 3:

My trick is to scan the images to a losslessly compressed format (tiff + compression). This way, blank pages have a much lower file size and I can detect them with find, move them to another directory, check them quickly with a viewer and then get rid of them.