How can I measure the similarity between 2 strings? [closed]

Given two strings text1 and text2:

public SOMEUSABLERETURNTYPE Compare(string text1, string text2)
{
     // DO SOMETHING HERE TO COMPARE
}

Examples:

  1. First String: StackOverflow

    Second String: StaqOverflow

    Return: Similarity is 91%

    The return can be in % or something like that.

  2. First String: The simple text test

    Second String: The complex text test

    Return: The values can be considered equal

Any ideas? What is the best way to do this?


There are various different ways of doing this. Have a look at the Wikipedia "String similarity measures" page for links to other pages with algorithms.

I don't think any of those algorithms take sounds into consideration, however - so "staq overflow" would be as similar to "stack overflow" as "staw overflow" despite the first being more similar in terms of pronunciation.

I've just found another page which gives rather more options... in particular, the Soundex algorithm (Wikipedia) may be closer to what you're after.


Levenshtein distance is probably what you're looking for.