How to keep the delimiters of Regex.Split?

I'd like to split a string using the Split function in the Regex class. The problem is that it removes the delimiters and I'd like to keep them. Preferably as separate elements in the splitee.

According to other discussions that I've found, there are only inconvenient ways to achieve that.

Any suggestions?


Solution 1:

Just put the pattern into a capture-group, and the matches will also be included in the result.

string[] result = Regex.Split("123.456.789", @"(\.)");

Result:

{ "123", ".", "456", ".", "789" }

This also works for many other languages:

  • JavaScript: "123.456.789".split(/(\.)/g)
  • Python: re.split(r"(\.)", "123.456.789")
  • Perl: split(/(\.)/g, "123.456.789")

(Not Java though)

Solution 2:

Use Matches to find the separators in the string, then get the values and the separators.

Example:

string input = "asdf,asdf;asdf.asdf,asdf,asdf";

var values = new List<string>();
int pos = 0;
foreach (Match m in Regex.Matches(input, "[,.;]")) {
  values.Add(input.Substring(pos, m.Index - pos));
  values.Add(m.Value);
  pos = m.Index + m.Length;
}
values.Add(input.Substring(pos));

Solution 3:

Say that input is "abc1defg2hi3jkl" and regex is to pick out digits.

String input = "abc1defg2hi3jkl";
var parts = Regex.Matches(input, @"\d+|\D+")
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();

Parts would be: abc 1 defg 2 hi 3 jkl

Solution 4:

For Java:

Arrays.stream("123.456.789".split("(?<=\\.)|(?=\\.)+"))
                .forEach((p) -> {
                    System.out.println(p);
                });

outputs:

123
.
456
.
789

inspired from this post (How to split string but keep delimiters in java?)