Checking images for similarity with OpenCV

Does OpenCV support the comparison of two images, returning some value (maybe a percentage) that indicates how similar these images are? E.g. 100% would be returned if the same image was passed twice, 0% would be returned if the images were totally different.

I already read a lot of similar topics here on StackOverflow. I also did quite some Googling. Sadly I couldn't come up with a satisfying answer.

Solution 1:

This is a huge topic, with answers from 3 lines of code to entire research magazines.

I will outline the most common such techniques and their results.

Comparing histograms

One of the simplest & fastest methods. Proposed decades ago as a means to find picture simmilarities. The idea is that a forest will have a lot of green, and a human face a lot of pink, or whatever. So, if you compare two pictures with forests, you'll get some simmilarity between histograms, because you have a lot of green in both.

Downside: it is too simplistic. A banana and a beach will look the same, as both are yellow.

OpenCV method: compareHist()

Template matching

A good example here matchTemplate finding good match. It convolves the search image with the one being search into. It is usually used to find smaller image parts in a bigger one.

Downsides: It only returns good results with identical images, same size & orientation.

OpenCV method: matchTemplate()

Feature matching

Considered one of the most efficient ways to do image search. A number of features are extracted from an image, in a way that guarantees the same features will be recognized again even when rotated, scaled or skewed. The features extracted this way can be matched against other image feature sets. Another image that has a high proportion of the features matching the first one is considered to be depicting the same scene.

Finding the homography between the two sets of points will allow you to also find the relative difference in shooting angle between the original pictures or the amount of overlapping.

There are a number of OpenCV tutorials/samples on this, and a nice video here. A whole OpenCV module (features2d) is dedicated to it.

Downsides: It may be slow. It is not perfect.

Over on the OpenCV Q&A site I am talking about the difference between feature descriptors, which are great when comparing whole images and texture descriptors, which are used to identify objects like human faces or cars in an image.

Solution 2:

If for matching identical images ( same size/orientation )

// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
    // Calculate the L2 relative error between images.
    double errorL2 = norm( A, B, CV_L2 );
    // Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
    double similarity = errorL2 / (double)( A.rows * A.cols );
    return similarity;
}
else {
    //Images have a different size
    return 100000000.0;  // Return a bad value
}

Source

Solution 3:

Sam's solution should be sufficient. I've used combination of both histogram difference and template matching because not one method was working for me 100% of the times. I've given less importance to histogram method though. Here's how I've implemented in simple python script.

import cv2

class CompareImage(object):

    def __init__(self, image_1_path, image_2_path):
        self.minimum_commutative_image_diff = 1
        self.image_1_path = image_1_path
        self.image_2_path = image_2_path

    def compare_image(self):
        image_1 = cv2.imread(self.image_1_path, 0)
        image_2 = cv2.imread(self.image_2_path, 0)
        commutative_image_diff = self.get_image_difference(image_1, image_2)

        if commutative_image_diff < self.minimum_commutative_image_diff:
            print "Matched"
            return commutative_image_diff
        return 10000 //random failure value

    @staticmethod
    def get_image_difference(image_1, image_2):
        first_image_hist = cv2.calcHist([image_1], [0], None, [256], [0, 256])
        second_image_hist = cv2.calcHist([image_2], [0], None, [256], [0, 256])

        img_hist_diff = cv2.compareHist(first_image_hist, second_image_hist, cv2.HISTCMP_BHATTACHARYYA)
        img_template_probability_match = cv2.matchTemplate(first_image_hist, second_image_hist, cv2.TM_CCOEFF_NORMED)[0][0]
        img_template_diff = 1 - img_template_probability_match

        # taking only 10% of histogram diff, since it's less accurate than template method
        commutative_image_diff = (img_hist_diff / 10) + img_template_diff
        return commutative_image_diff


    if __name__ == '__main__':
        compare_image = CompareImage('image1/path', 'image2/path')
        image_difference = compare_image.compare_image()
        print image_difference

Solution 4:

A little bit off topic but useful is the pythonic numpy approach. Its robust and fast but just does compare pixels and not the objects or data the picture contains (and it requires images of same size and shape):

A very simple and fast approach to do this without openCV and any library for computer vision is to norm the picture arrays by

import numpy as np
picture1 = np.random.rand(100,100)
picture2 = np.random.rand(100,100)
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))

After defining both normed pictures (or matrices) you can just sum over the multiplication of the pictures you like to compare:

1) If you compare similar pictures the sum will return 1:

In[1]: np.sum(picture1_norm**2)
Out[1]: 1.0

2) If they aren't similar, you'll get a value between 0 and 1 (a percentage if you multiply by 100):

In[2]: np.sum(picture2_norm*picture1_norm)
Out[2]: 0.75389941124629822

Please notice that if you have colored pictures you have to do this in all 3 dimensions or just compare a greyscaled version. I often have to compare huge amounts of pictures with arbitrary content and that's a really fast way to do so.