Excel 2013 Fuzzy Lookup to find near-duplicate text
I have a list of captions with a large number of near-duplicates. For example:
- Birthday for Her
- For Her Birthday
- Birthday - For Her
- For Her / Birthday
I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates
I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates
The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Excel.
Fuzzy Lookup Add-In for Excel
The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel.
It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data.
For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match.
While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.
Source Fuzzy Lookup Add-In for Excel
Any suggestions on the Similarity Threshold configuration?
Performing Fuzzy Lookups in Excel has some hints on Similarity Threshold configuration.