You could try to use local features like SIFT here: http://en.wikipedia.org/wiki/Scale-invariant_feature_transform

It should work because logo shape is usually constant, so extracted features shall match well.

The workflow will be like this:

  1. Detect corners (e.g. Harris corner detector) - for Nike logo they are two sharp ends.

  2. Compute descriptors (like SIFT - 128D integer vector)

  3. On training stage remember them; on matching stage find nearest neighbours for every feature in the database obtained during training. Finally, you have a set of matches (some of them are probably wrong).

  4. Seed out wrong matches using RANSAC. Thus you'll get the matrix that describes transform from ideal logo image to one where you find the logo. Depending on the settings, you could allow different kinds of transforms (just translation; translation and rotation; affine transform).

Szeliski's book has a chapter (4.1) on local features. http://research.microsoft.com/en-us/um/people/szeliski/Book/

P.S.

  1. I assumed you wanna find logos in photos, for example find all Pepsi billboards, so they could be distorted. If you need to find a TV channel logo on the screen (so that it is not rotated and scaled), you could do it easier (pattern matching or something).

  2. Conventional SIFT does not consider color information. Since logos usually have constant colors (though the exact color depends on lightning and camera) you might want to consider color information somehow.


We worked on logo detection/recognition in real-world images. We also created a dataset FlickrLogos-32 and made it publicly available, including data, ground truth and evaluation scripts.

In our work we treated logo recognition as retrieval problem to simplify multi-class recognition and to allow such systems to be easily scalable to many (e.g. thousands) logo classes.

Recently, we developed a bundling technique called Bundle min-Hashing that aggregates spatial configurations of multiple local features into highly distinctive feature bundles. The bundle representation is usable for both retrieval and recognition. See the following example heatmaps for logo detections:

enter image description hereenter image description here

You will find more details on the internal operations, potential applications of the approach, experiments on its performance and of course also many references to related work in the papers [1][2].


Worked on that: Trademark matching and retrieval in sports video databases get a PDF of the paper: http://scholar.google.it/scholar?cluster=9926471658203167449&hl=en&as_sdt=2000

We used SIFT as trademark and image descriptors, and a normalized threshold matching to compute the distance between models and images. In our latest work we have been able to greatly reduce computation using meta-models, created evaluating the relevance of the SIFT points that are present in different versions of the same trademark.

I'd say that in general working with videos is harder than working on photos due to the very bad visual quality of the TV standards currently used.

Marco


I worked on a project where we had to do something very similar. At first I tried using Haar Training techniques using this software

OpenCV

It worked, but was not an optimal solution for our needs. Our source images (where we were looking for the logo) were a fixed size and only contained the logo. Because of this we were able to use cvMatchShapes with a known good match and compare the value returned to deem a good match.