Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:

dosage_value = Regex.Replace(dosage_value, @"(\d)\s+", @"$1");
dosage_value = Regex.Replace(dosage_value, @"(\d)%\s+", @"$1%");
dosage_value = Regex.Replace(dosage_value, @"(\d+(\.\d+)?)", @"$1 ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+%", @"$1% ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+:", @"$1:");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+e", @"$1e");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+E", @"$1E");

Example:

10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05

should become

10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05

Exceptions are: %, E, e and :. I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?

Thank you!


For your example data, you might use 2 capture groups where the second group is in an optional part.

In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.

(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
  • ( Capture group 1
    • \d+(?:\.\d+)? match 1+ digits with an optional decimal part
  • ) Close group 1
  • (?: Non capture group to match a as a whole
    • \s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
  • )? Close non capture group and make it optional

.NET regex demo

string[] strings = new string[]
{
    "10ANYUNIT",
    "10:something",
    "10 : something",
    "10 %",
    "40 e-5",
    "40 E-05",
};
string pattern = @"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s => 
    Regex.Replace(
        s, pattern, m => 
        m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
    )
);

Array.ForEach(result.ToArray(), Console.WriteLine);

Output

10 ANYUNIT
10:something
10: something
10%
40e-5 
40E-05

As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:

\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?