Simplify regex code in C#: Add a space between a digit/decimal and unit
I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+", @"$1");
dosage_value = Regex.Replace(dosage_value, @"(\d)%\s+", @"$1%");
dosage_value = Regex.Replace(dosage_value, @"(\d+(\.\d+)?)", @"$1 ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+%", @"$1% ");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+:", @"$1:");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+e", @"$1e");
dosage_value = Regex.Replace(dosage_value, @"(\d)\s+E", @"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :
.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
-
(
Capture group 1-
\d+(?:\.\d+)?
match 1+ digits with an optional decimal part
-
-
)
Close group 1 -
(?:
Non capture group to match a as a whole-
\s*([%:eE])
Match optional whitespace chars, and capture 1 of%
:
e
E
in group 2
-
-
)?
Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = @"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d
can also match digits from other languages, \s
can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?