Extracting email from text string in excel/google spreadsheet

I'm looking to extract email addresses from text strings in Google Docs/Excel and email them automatically, where the content of the cells are currently unknown (until updated by user).

For example, they might read:

  1. Hi,

    My friend [email protected].

    Thanks,

    Example Joe

OR

  1. You should ask

    [email protected]

I've currently got this formula:

=IFERROR((LEFT(CELL,FIND("@",CELL)-1))&"@"&(regexextract(CELL,"@(.*)"))) 

It works for almost all cases, except

a) as in example 1, where someone puts a , or . at the end of the email

b) as in example 2, where the email starts on a new line, it comes in as eg.

ask

[email protected]

How might I adapt the formula to correct these?


Solution 1:

For Google Spreadsheets

Google Spreadsheets has these cool already-builtin regex formulas

  • REGEXEXTRACT, REGEXMATCH, REGEXREPLACE

We use the first one to extract mail adresses. Put this formula in a Google Spreadsheet cell:

=iferror(Regexextract(A1;"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}");"")
  • A1 is the cell where the string to check (mail body in your case) should reside
  • Complete formula consist of 2 parts. The inner part is the regex formula and the outer part is for error prevention
  • Regexextract(A1,"\[A-z0-9._%+-\]+@\[A-z0-9.-\]+\.\[A-z\]{2,4}") returns the mail address
  • iferror(innerformula,"") prevents #N/A when the regex formula wasn't able to return anything e.g no valid mail address was found

enter image description here

How does the regex pattern work?

[A-z0-9._%+-]+@[A-z0-9.-]+.[A-z]{2,4}

  • A-z represents any character between an A and a z.
    Note the uppercase and lowercase. This way it's case-insensitive
  • 0-9 represents any digit
  • ._%+- represent those signs itself
  • [ ] represents a single character which is allowed inside the brackets
  • Putting a + sign behind [ ] allows the previous pattern to be repeated infinitely
  • @ has no special meaning. It literally searches for a @ sign
  • [A-z0-9.-]+ is the same as above. But _%+ are not allowed behind a @ sign this time
  • \. searches for a single point. It has to be escaped with a preceeding \ because . normally is a placeholder any character
  • At last [A-z]{2,4} searches for 2,3 or 4 case-insensitive characters

Used ressources

  • http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Solution 2:

This is for Excel.

Consider the following User Defined Function (UDF):

Public Function GetEmailAddy(Sin As String) As String
    Dim s As String
    If InStr(1, Sin, "@") = 0 Then
        GetEmailAddy = ""
        Exit Function
    End If

    s = Replace(Sin, Chr(10), " ")
    s = Replace(s, Chr(13), " ")
    s = Application.WorksheetFunction.Trim(s)
    ary = Split(s, " ")

    For Each a In ary
        If InStr(1, a, "@") > 0 Then
            GetEmailAddy = a
            Exit Function
        End If
    Next a
End Function

User Defined Functions (UDFs) are very easy to install and use:

  1. ALT-F11 brings up the VBE window
  2. ALT-I ALT-M opens a fresh module
  3. paste the stuff in and close the VBE window

If you save the workbook, the UDF will be saved with it. If you are using a version of Excel later then 2003, you must save the file as .xlsm rather than .xlsx

To remove the UDF:

  1. bring up the VBE window as above
  2. clear the code out
  3. close the VBE window

To use the UDF from Excel:

=GetEmailAddy(A1)

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

and

http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx

and for specifics on UDFs, see:

http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx

Macros must be enabled for this to work!

Here are some examples:

enter image description here