How to extract numbers from string and separate them by comma (or other separator) (substitute, replace, erase, isnumber, remove, delete, ...)
I need to extract numbers from a string in Excel and the numbers need to be separated by a comma as shown in row 3 (or any other separator).
I can extract the numbers in row 1 by:
=SUMPRODUCT(MID(0&A1, LARGE(INDEX(ISNUMBER(--MID(A1, ROW(INDIRECT("1:"&LEN(A1))), 1)) * ROW(INDIRECT("1:"&LEN(A1))), 0), ROW(INDIRECT("1:"&LEN(A1))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A1)))/10)´´´
Or in row 2 by:
=SUM(MID(0&A2,LARGE(INDEX(ISNUMBER(--MID(A2,ROW($1:$99),1))*ROW($1:$99),),ROW($1:$99))+1,1)*10^ROW($1:$99)/10)
However, I would need them to be separated somehow.
NOTE: I have up to 200 different combination of numbers and car brands, which cause simple substitute functions to reach their limits. I prefer a formula based solution without hidden interim columns if possible (certainly no manual mouse clicking). My VBA skill are low, but I'm open for new challenges!
Solution 1:
Edit:
It seems the real data is more complicated than the sample initially submitted. The simplest way to program this is to use Regular Expressions to extract the numbers
- Extract any digit string that follows either the start of the string; or a
comma
optionally followed by one or morespaces
Edit2 Converted to late-binding after I read template will be distributed
Option Explicit
Function extrNums(cellRef) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As Variant
Const sPat As String = "(?:^|,\s*)(\d+)\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
If .Test(cellRef) Then
Set MC = .Execute(cellRef)
For Each M In MC
sTemp = sTemp & ", " & M.SubMatches(0)
Next M
End If
End With
extrNums = Mid(sTemp, 3)
End Function
Regex Explanation
extract numbers
(?:^|,\s*)(\d+)\b
Options: ^$ match at line breaks
-
Match the regular expression below
(?:^|,\s*)
-
Match this alternative
^
-
Assert position at the beginning of the string
^
-
Assert position at the beginning of the string
-
Or match this alternative
,\s*
-
Match the character “,” literally
,
-
Match a single character that is a “whitespace character”
\s*
-
Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
*
-
Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
-
Match the character “,” literally
-
Match this alternative
-
Match the regex below and capture its match into backreference number 1
(\d+)
-
Match a single character that is a “digit”
\d+
-
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
+
-
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
-
Match a single character that is a “digit”
-
Assert position at a word boundary
\b
Created with RegexBuddy
Solution 2:
With Office 365 or 2019 Excel we can use TEXTJOIN and FILTERXML.
=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,","," ")," ","</s><s>")&"</s></t>","//s[number()=.]"))
The FILTERXML parses a made up string that looks like xml. We substitute the spaces and the commas for </s><s>
then only return the nodes that are numeric.
The TEXTJOIN takes the array returned and adds a comma between each element.
Without text join we can concatenate many:
IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")
The added [1]
to the FILTERXML is which node to return. So we can string a bunch together:
=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][2]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][3]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][4]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][5]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," ")," "," ")," ","</s><s>")&"</s></t>","//s[number()=.][6]"),"")
This will do a max of 6, if there possibly can be more add more lines changing the number in the [#]
to the next.