Algorithm for finding similar images

Solution 1:

I have done something similar, by decomposing images into signatures using wavelet transform.

My approach was to pick the most significant n coefficients from each transformed channel, and recording their location. This was done by sorting the list of (power,location) tuples according to abs(power). Similar images will share similarities in that they will have significant coefficients in the same places.

I found it was best to transform in the image into YUV format, which effectively allows you weight similarity in shape (Y channel) and colour (UV channels).

You can in find my implementation of the above in mactorii, which unfortunately I haven't been working on as much as I should have :-)

Another method, which some friends of mine have used with surprisingly good results, is to simply resize your image down to say, a 4x4 pixel and store that as your signature. How similar 2 images are can be scored by say, computing the Manhattan distance between the 2 images, using corresponding pixels. I don't have the details of how they performed the resizing, so you may have to play with the various algorithms available for that task to find one which is suitable.

Solution 2:

pHash might interest you.

perceptual hash n. a fingerprint of an audio, video or image file that is mathematically based on the audio or visual content contained within. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the inputs are visually or auditorily similar.