How to extract numbers from string and separate them by comma (or other separator) (substitute, replace, erase, isnumber, remove, delete, ...)

I need to extract numbers from a string in Excel and the numbers need to be separated by a comma as shown in row 3 (or any other separator).

enter image description here

I can extract the numbers in row 1 by:

=SUMPRODUCT(MID(0&A1, LARGE(INDEX(ISNUMBER(--MID(A1, ROW(INDIRECT("1:"&LEN(A1))), 1)) * ROW(INDIRECT("1:"&LEN(A1))), 0), ROW(INDIRECT("1:"&LEN(A1))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A1)))/10)´´´

Or in row 2 by:

=SUM(MID(0&A2,LARGE(INDEX(ISNUMBER(--MID(A2,ROW($1:$99),1))*ROW($1:$99),),ROW($1:$99))+1,1)*10^ROW($1:$99)/10)

However, I would need them to be separated somehow.

NOTE: I have up to 200 different combination of numbers and car brands, which cause simple substitute functions to reach their limits. I prefer a formula based solution without hidden interim columns if possible (certainly no manual mouse clicking). My VBA skill are low, but I'm open for new challenges!


Solution 1:

Edit:
It seems the real data is more complicated than the sample initially submitted. The simplest way to program this is to use Regular Expressions to extract the numbers

  • Extract any digit string that follows either the start of the string; or a comma optionally followed by one or more spaces

Edit2 Converted to late-binding after I read template will be distributed

Option Explicit
Function extrNums(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant
    Const sPat As String = "(?:^|,\s*)(\d+)\b"
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    If .Test(cellRef) Then
        Set MC = .Execute(cellRef)
        For Each M In MC
            sTemp = sTemp & ", " & M.SubMatches(0)
        Next M
    End If
End With

extrNums = Mid(sTemp, 3)

End Function

enter image description here

Regex Explanation
extract numbers
(?:^|,\s*)(\d+)\b

Options: ^$ match at line breaks

  • Match the regular expression below (?:^|,\s*)
    • Match this alternative ^
      • Assert position at the beginning of the string ^
    • Or match this alternative ,\s*
      • Match the character “,” literally ,
      • Match a single character that is a “whitespace character” \s*
        • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
  • Match the regex below and capture its match into backreference number 1 (\d+)
    • Match a single character that is a “digit” \d+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Assert position at a word boundary \b

Created with RegexBuddy

Solution 2:

With Office 365 or 2019 Excel we can use TEXTJOIN and FILTERXML.

=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,","," ")," ","</s><s>")&"</s></t>","//s[number()=.]"))

The FILTERXML parses a made up string that looks like xml. We substitute the spaces and the commas for </s><s> then only return the nodes that are numeric.

The TEXTJOIN takes the array returned and adds a comma between each element.

enter image description here

Without text join we can concatenate many:

IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")

The added [1] to the FILTERXML is which node to return. So we can string a bunch together:

=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][2]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][3]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][4]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][5]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][6]"),"")

This will do a max of 6, if there possibly can be more add more lines changing the number in the [#] to the next.

enter image description here