How to keep the delimiters of Regex.Split?
I'd like to split a string using the Split
function in the Regex
class. The problem is that it removes the delimiters and I'd like to keep them. Preferably as separate elements in the splitee.
According to other discussions that I've found, there are only inconvenient ways to achieve that.
Any suggestions?
Solution 1:
Just put the pattern into a capture-group, and the matches will also be included in the result.
string[] result = Regex.Split("123.456.789", @"(\.)");
Result:
{ "123", ".", "456", ".", "789" }
This also works for many other languages:
-
JavaScript:
"123.456.789".split(/(\.)/g)
-
Python:
re.split(r"(\.)", "123.456.789")
-
Perl:
split(/(\.)/g, "123.456.789")
(Not Java though)
Solution 2:
Use Matches
to find the separators in the string, then get the values and the separators.
Example:
string input = "asdf,asdf;asdf.asdf,asdf,asdf";
var values = new List<string>();
int pos = 0;
foreach (Match m in Regex.Matches(input, "[,.;]")) {
values.Add(input.Substring(pos, m.Index - pos));
values.Add(m.Value);
pos = m.Index + m.Length;
}
values.Add(input.Substring(pos));
Solution 3:
Say that input is "abc1defg2hi3jkl" and regex is to pick out digits.
String input = "abc1defg2hi3jkl";
var parts = Regex.Matches(input, @"\d+|\D+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Parts would be: abc
1
defg
2
hi
3
jkl
Solution 4:
For Java:
Arrays.stream("123.456.789".split("(?<=\\.)|(?=\\.)+"))
.forEach((p) -> {
System.out.println(p);
});
outputs:
123
.
456
.
789
inspired from this post (How to split string but keep delimiters in java?)