How can I measure the similarity between 2 strings? [closed]
Given two strings text1
and text2
:
public SOMEUSABLERETURNTYPE Compare(string text1, string text2)
{
// DO SOMETHING HERE TO COMPARE
}
Examples:
-
First String: StackOverflow
Second String: StaqOverflow
Return: Similarity is 91%
The return can be in % or something like that.
-
First String: The simple text test
Second String: The complex text test
Return: The values can be considered equal
Any ideas? What is the best way to do this?
There are various different ways of doing this. Have a look at the Wikipedia "String similarity measures" page for links to other pages with algorithms.
I don't think any of those algorithms take sounds into consideration, however - so "staq overflow" would be as similar to "stack overflow" as "staw overflow" despite the first being more similar in terms of pronunciation.
I've just found another page which gives rather more options... in particular, the Soundex algorithm (Wikipedia) may be closer to what you're after.
Levenshtein distance is probably what you're looking for.