How to extract decimal number from string in C#

string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, @"\D+");

For this code I get these values in the digits array

10,20,40,1

string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, @"\D+");

For this code I get these values in the digits array

10,4,20,5,40,1

But I would like to get like

10.4,20.5,40,1 as decimal numbers. How can I achieve this?


Solution 1:

Small improvement to @Michael's solution:

// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
//     (c=>...) is the lambda for dealing with each element of the array
//     where c is an array element.
// .Trim()  == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, @"[^0-9\.]+")
    .Where(c => c != "." && c.Trim() != "");

Returns:

10.4
20.5
40
1

The original solution was returning

[empty line here]
10.4
20.5
40
1
.

Solution 2:

The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.

A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:

[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?

I only changed the capturing group to a non-capturing one (added ?: after (). It matches enter image description here

If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:

[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
           ^^^^

Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):

[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
            ^

Now, 34 is not matched: enter image description here is matched.

If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):

[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
          ^

See this demo. Now, it matches much fewer samples: enter image description here

Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:

[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)

And a demo here:

enter image description here

Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:

[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])

See the regex demo:

enter image description here

Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.

Now, how to use these patterns in C#?

var results = Regex.Matches(input, @"<PATTERN_HERE>")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToList();

Solution 3:

try

Regex.Split (sentence, @"[^0-9\.]+")

Solution 4:

You'll need to allow for decimal places in your regular expression. Try the following:

\d+(\.\d+)?

This will match the numbers rather than everything other than the numbers, but it should be simple to iterate through the matches to build your array.

Something to keep in mind is whether you should also be looking for negative signs, commas, etc.