C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.

How would I do this in C#?


Solution 1:

If the split chars were ,, ., and ;, I'd try:

using System.Text.RegularExpressions;
...    
string[] parts = Regex.Split(originalString, @"(?<=[.,;])")

(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.

Solution 2:

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:

string input = "plum-pear";
string pattern = "(-)";

string[] substrings = Regex.Split(input, pattern);    // Split on hyphens
foreach (string match in substrings)
{
   Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
//    'plum'
//    '-'
//    'pear'

So if you are looking for splitting a mathematical formula, you can use the following Regex

@"([*()\^\/]|(?<!E)[\+\-])" 

This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02

So:

Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")

Yields:

  • 10E-02
  • *
  • x
  • +
  • sin
  • (
  • x
  • )
  • ^
  • 2

Solution 3:

Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:

public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
    int start = 0, index;

    while ((index = s.IndexOfAny(delims, start)) != -1)
    {
        if(index-start > 0)
            yield return s.Substring(start, index - start);
        yield return s.Substring(index, 1);
        start = index + 1;
    }

    if (start < s.Length)
    {
        yield return s.Substring(start);
    }
}