How to use SIFT algorithm to compute how similar two images are?
I have used the SIFT implementation of Andrea Vedaldi, to calculate the sift descriptors of two similar images (the second image is actually a zoomed in picture of the same object from a different angle).
Now I am not able to figure out how to compare the descriptors to tell how similar the images are?
I know that this question is not answerable unless you have actually played with these sort of things before, but I thought that somebody who has done this before might know this, so I posted the question.
the little I did to generate the descriptors:
>> i=imread('p1.jpg');
>> j=imread('p2.jpg');
>> i=rgb2gray(i);
>> j=rgb2gray(j);
>> [a, b]=sift(i); % a has the frames and b has the descriptors
>> [c, d]=sift(j);
Solution 1:
First, aren't you supposed to be using vl_sift instead of sift?
Second, you can use SIFT feature matching to find correspondences in the two images. Here's some sample code:
I = imread('p1.jpg');
J = imread('p2.jpg');
I = single(rgb2gray(I)); % Conversion to single is recommended
J = single(rgb2gray(J)); % in the documentation
[F1 D1] = vl_sift(I);
[F2 D2] = vl_sift(J);
% Where 1.5 = ratio between euclidean distance of NN2/NN1
[matches score] = vl_ubcmatch(D1,D2,1.5);
subplot(1,2,1);
imshow(uint8(I));
hold on;
plot(F1(1,matches(1,:)),F1(2,matches(1,:)),'b*');
subplot(1,2,2);
imshow(uint8(J));
hold on;
plot(F2(1,matches(2,:)),F2(2,matches(2,:)),'r*');
vl_ubcmatch() essentially does the following:
Suppose you have a point P in F1 and you want to find the "best" match in F2. One way to do that is to compare the descriptor of P in F1 to all the descriptors in D2. By compare, I mean find the Euclidean distance (or the L2-norm of the difference of the two descriptors).
Then, I find two points in F2, say U & V which have the lowest and second-lowest distance (say, Du and Dv) from P respectively.
Here's what Lowe recommended: if Dv/Du >= threshold (I used 1.5 in the sample code), then this match is acceptable; otherwise, it's ambiguously matched and is rejected as a correspondence and we don't match any point in F2 to P. Essentially, if there's a big difference between the best and second-best matches, you can expect this to be a quality match.
This is important since there's a lot of scope for ambiguous matches in an image: imagine matching points in a lake or a building with several windows, the descriptors can look very similar but the correspondence is obviously wrong.
You can do the matching in any number of ways .. you can do it yourself very easily with MATLAB or you can speed it up by using a KD-tree or an approximate nearest number search like FLANN which has been implemented in OpenCV.
EDIT: Also, there are several kd-tree implementations in MATLAB.
Solution 2:
You should read David Lowe's paper, which talks about how to do exactly that. It should be sufficient, if you want to compare images of the exact same object. If you want to match images of different objects of the same category (e.g. cars or airplanes) you may want to look at the Pyramid Match Kernel by Grauman and Darrell.
Solution 3:
Try to compare each descriptor from the first image with descriptors from the second one situated in a close vicinity (using the Euclidean distance). Thus, you assign a score to each descriptor from the first image based on the degree of similarity between it and the most similar neighbor descriptor from the second image. A statistical measure (sum, mean, dispersion, mean error, etc) of all these scores gives you an estimate of how similar the images are. Experiment with different combinations of vicinity size and statistical measure to give you the best answer.
Solution 4:
If you want just compare zoomed and rotated image with known center of rotation you can use phase correlation in log-polar coordinates. By sharpness of peak and histogram of phase correlation you can judge how close images are. You can also use euclidean distance on absolute value of Fourier coefficients.
If you want compare SIFT descriptor, beside euclidean distance you can also use "diffuse distance" - getting descriptor on progressively more rough scale and concatenating them with original descriptor. That way "large scale" feature similarity would have more weight.