Regex split string but keep separators
I'd like to do a Regex.Split on some separators but I'd like to keep the separators. To give an example of what I'm trying:
"abc[s1]def[s2][s3]ghi" --> "abc", "[s1]", "def", "[s2]", "[s3]", "ghi"
The regular expression I've come up with is new Regex("\\[|\\]|\\]\\[")
. However, this gives me the following:
"abc[s1]def[s2][s3]ghi" --> "abc", "s1", "def", "s2", "", "s3", "ghi"
The separators have disappeared (which makes sense given my regex). Is there a way to write the regex so that the separators themselves are preserved?
Solution 1:
Use zero-length maching lookarounds; you want to split on
(?=\[)|(?<=\])
That is, anywhere where we assert a match of a literal [
ahead, or where we assert a match of literal ]
behind.
As a C# string literal, this is
@"(?=\[)|(?<=\])"
See also
- regular-expressions.info/Lookarounds
Related questions
- Java split is eating my characters. -- has many examples
Example in Java
System.out.println(java.util.Arrays.toString(
"abc[s1]def[s2][s3]ghi".split("(?=\\[)|(?<=\\])")
));
// prints "[abc, [s1], def, [s2], [s3], ghi]"
System.out.println(java.util.Arrays.toString(
"abc;def;ghi;".split("(?<=;)")
));
// prints "[abc;, def;, ghi;]"
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
));
// prints "[Oh, My, God]"
Solution 2:
You could use .Matches
instead of .Split
, example (http://www.ideone.com/gUjRM):
string x = "abc[s1]def[s2][s3]ghi";
var r = new Regex(@"[^\[]+|\[[^\]]+\]");
var ms = r.Matches(x);
// do stuff with the MatchCollection `ms`.