Database of common name aliases / nicknames of people

I'm involved with a SQL / .NET project that will be searching through a list of names. I'm looking for a way to return some results on similar first names of people. If searching for "Tom" the results would include Thom, Thomas, etc. It is not important whether this be a file or a web service. Example Design:

Table "Names" has Name and NameID
Table "Nicknames" has Nickname, NicknameID and NameID

Example output:

You searched for "John Smith"
You show results Jon Smith, Jonathan Smith, Johnny Smith, ...

Are there any databases out there (public or paid) suited to this type of task to populate a relationship between nicknames and names?


Solution 1:

I'm adding another source for anyone who comes across this question via Google. This project provides a very good lookup for this purpose.

https://github.com/carltonnorthern/nickname-and-diminutive-names-lookup

It's somewhat simpler and less complete than pdNickName but on the other hand it's free and easy to use.

Solution 2:

A google search on "Database of Nicknames" turned up pdNickName (for pay).

In addition, I think you only need a single table for this job, not two, with NameID, Name, and MasterNameID. All the nicknames go into the Name column. One name is considered the "canonical" one. All the nickname records use the MasterNameID column to point back to that record, with the canonical name pointing to itself.

Your two table schema contains no additional information and, depending on how you fill in the nickname table, you might need extra code to handle the canonical cases.

Solution 3:

I just found this site.

It looks like you could script it pretty easily.

http://www.behindthename.com/php/extra.php?terms=steve&extra=r&gender=m

I just wish I could auto narrow this to english..

Solution 4:

Another commercial name matching database is: http://www.basistech.com/name-indexer/

It looks quite professional (though potentially expensive).

They claim to support the following languages:
Arabic, Chinese (Simplified), Chinese (Traditional), Persian (Farsi / Dari), English, Japanese, Korean, Pashto, Russian, Urdu

Solution 5:

Here is a github repo with csv of related names, and you can contribute back:

The first few lines show the format:

aaron,ron
abel,abe
abednego,bedney
abijah,ab,bige
abigail,ab,abbie,abby,gail
abner,ab,abbie,abby
abraham,abe,abram,bram
absalom,ab,abbie,app