Is there a list of characters that look similar to English letters?

This is probably both vastly more deep than you need, yet not wide enough to cover your use case, but the Unicode consortium have had to deal with attacks against internationalised domain names and came up with this list of homographs (characters with the same or similar rendering):

http://www.unicode.org/Public/security/latest/confusables.txt

Might make a starting point at least.


http://en.wikipedia.org/wiki/Letterlike_Symbols

It's much much much less comprehensive but is more comprehensible.


I created a python class to do exactly this, based on Robin's unicode link for "confusables"

https://github.com/wanderingstan/Confusables

For example, "Hello" would get expanded into the following set of regexp character classes:

[H\īŧ¨\ℋ\ℌ\ℍ\𝐇\đģ\đ‘¯\𝓗\đ•ŗ\𝖧\𝗛\𝘏\𝙃\𝙷\Η\𝚮\𝛨\đœĸ\𝝜\𝞖\Ⲏ\Н\áŽģ\á•ŧ\ꓧ\𐋏\⹧\Ōĸ\ÄĻ\Ķ‰\Ķ‡] [e\℮\īŊ…\ℯ\ⅇ\𝐞\𝑒\𝒆\𝓮\đ”ĸ\𝕖\𝖊\𝖾\𝗲\đ˜Ļ\𝙚\𝚎\ęŦ˛\Đĩ\ŌŊ\ɇ\Ōŋ] [l\‎\|\âˆŖ\âŊ\īŋ¨1\‎\Ûą\𐌠\‎\𝟏\𝟙\đŸŖ\𝟭\𝟷I\īŧŠ\Ⅰ\ℐ\ℑ\𝐈\đŧ\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\īŊŒ\â…ŧ\ℓ\đĨ\𝑙\𝒍\𝓁\đ“ĩ\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\Į€\Ι\𝚰\đ›Ē\𝜤\𝝞\𝞘\Ⲓ\І\Ķ€\‎\‎\‎\‎\‎\‎\‎\‎\âĩ\ᛁ\ꓲ\đ–ŧ¨\𐊊\𐌉\‎\‎\ł\É­\Ɨ\ƚ\ÉĢ\‎\‎\‎\‎\ŀ\Äŋ\ᒷ\🄂\⒈\‎\⒓\ãĢ\㋋\㍤\⒔\ãŦ\ãĨ\⒕\㏭\ãĻ\⒖\㏎\㍧\⒗\㏯\㍨\⒘\㏰\㍊\⒙\ãą\ãĒ\⒚\ã˛\ãĢ\Į‰\IJ\‖\âˆĨ\Ⅱ\Į\‎\𐆙\⒒\â…ĸ\𐆘\ãĒ\㋊\ãŖ\ĐŽ\⒑\㏊\㋉\ãĸ\ĘĒ\â‚ļ\â…Ŗ\Ⅸ\ÉŽ\ĘĢ\㏠\㋀\㍙] [l\‎\|\âˆŖ\âŊ\īŋ¨1\‎\Ûą\𐌠\‎\𝟏\𝟙\đŸŖ\𝟭\𝟷I\īŧŠ\Ⅰ\ℐ\ℑ\𝐈\đŧ\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\īŊŒ\â…ŧ\ℓ\đĨ\𝑙\𝒍\𝓁\đ“ĩ\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\Į€\Ι\𝚰\đ›Ē\𝜤\𝝞\𝞘\Ⲓ\І\Ķ€\‎\‎\‎\‎\‎\‎\‎\‎\âĩ\ᛁ\ꓲ\đ–ŧ¨\𐊊\𐌉\‎\‎\ł\É­\Ɨ\ƚ\ÉĢ\‎\‎\‎\‎\ŀ\Äŋ\ᒷ\🄂\⒈\‎\⒓\ãĢ\㋋\㍤\⒔\ãŦ\ãĨ\⒕\㏭\ãĻ\⒖\㏎\㍧\⒗\㏯\㍨\⒘\㏰\㍊\⒙\ãą\ãĒ\⒚\ã˛\ãĢ\Į‰\IJ\‖\âˆĨ\Ⅱ\Į\‎\𐆙\⒒\â…ĸ\𐆘\ãĒ\㋊\ãŖ\ĐŽ\⒑\㏊\㋉\ãĸ\ĘĒ\â‚ļ\â…Ŗ\Ⅸ\ÉŽ\ĘĢ\㏠\㋀\㍙] [o\ā°‚\ā˛‚\ā´‚\āļ‚\āĨĻ\āŠĻ\āĢĻ\ā¯Ļ\āąĻ\āŗĻ\āĩĻ\āš\āģ\၀\‎\Ûĩ\īŊ\ℴ\𝐨\𝑜\𝒐\𝓸\đ”Ŧ\𝕠\𝖔\𝗈\đ—ŧ\𝘰\𝙤\𝚘\ᴏ\ᴑ\ęŦŊ\Îŋ\𝛐\𝜊\𝝄\𝝾\𝞸\Īƒ\𝛔\𝜎\𝝈\𝞂\đžŧ\ⲟ\Đž\áƒŋ\օ\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\ā´ \ဝ\đ“Ē\đ‘Ŗˆ\đ‘Ŗ—\đŦ\‎\ø\ęŦž\Éĩ\ꝋ\ĶŠ\Ņŗ\ꮎ\ęŽģ\ę­´\‎\ÆĄ\œ\Éļ\∞\ꝏ\ꚙ\āĩŸ\တ]

This regexp will match against "𝓗℮đĨ1āŗĻ"