Comparing two histograms
Comparing histograms is quite a subject in itself.
You've got two big classes of comparison functions : bin-to-bin comparison and cross-bin comparison.
- Bin-to-bin comparison : As you stated, standard sum of differences is quite bad. There's an improvement: the Chi-squared distance. If
H1.red[0] = 0.001
andH2.red[0] = 0.011
, thenH2.red[0]
is much more important than ifH1.red[0] = 0.1
andH2.red[0] = 0.11
, even though in both cases|H1.red[0] - H2.red[0]| = 0.01
. - Cross-bin comparison : A standard example called the bin-similarity matrix requires some similarity matrix
M
where inM(i,j)
is the similarity between the binsi
andj
. Assumebin[i]
is red. Ifbin[j]
is dark red, thenM(i,j)
is large. Ifbin[j]
is green,M(i,j)
is small. Then, the distance between histogramsH1
andH2
would besqrt((H1-H2)*M*(H1-H2))
. This method takes in account what you've said about "close" bins! Earth Moving Distance (EMD) is another kind of cross-bin distance.
To finish, I've got three points :
- You should read this paper on histogram distance. It's quite easy and introduces you to histogram distances. All the distances I talked about are summed up nicely in chapter 1. Honestly, the final thing described in the article is not that complex, but it's probably overkill for your case.
- Cross-bin distance is very good, but can be costly (i.e : long to compute, because it involves a matrix, thus is O(n^2)). The simplest way to circumvent the expensive cross-bin computation (and it is widely done) is to do some soft assignment : if a pixel is red, then you should fill ALL the bins that are remotely looking like red (of course, giving more weight to the closest colors). Then you can use a bin-to-bin algorithm.
- A bit more math-centric : the previous point was all about reducing a cross-bin comparison to a bin-to-bin comparison. In fact, it consists of implicitly diagonalizing the similarity matrix M. If you can diagonalize
M = P'*D*P
whereP'
is the transpose ofP
, thensqrt((H1-H2)'*M*(H1-H2)) = sqrt((H1-H2)'*P'*D*P*(H1-H2)) = sqrt((P(H1-H2))'*D*(P(H1-H2)))
. Depending on how trivial it is for you to computeP(H1-H2)
, this can save you computation time. Intuitively, ifH1
is your original histogram,P*H1
is a soft assignment and you are using the implicit similarity matrixM = P'*Id*P
.
I'm surprised that no one mentioned opencv implementation of the histogram comparison, and can easily handle multichannel images (grayscale, rgb, rgba, etc) of different format (uchar, float, double, etc)
Includes the Bhattacharyya distance, Chi-Square, correlation and intersection methods. You can find the
compareHist(InputArray H1, InputArray H2, int method)
function in the manual here.