Excel 2013 Fuzzy Lookup to find near-duplicate text

I have a list of captions with a large number of near-duplicates. For example:

  • Birthday for Her
  • For Her Birthday
  • Birthday - For Her
  • For Her / Birthday

I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates


I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates

The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Excel.


Fuzzy Lookup Add-In for Excel

The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel.

It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data.

For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match.

While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.

Source Fuzzy Lookup Add-In for Excel


Any suggestions on the Similarity Threshold configuration?

Performing Fuzzy Lookups in Excel has some hints on Similarity Threshold configuration.