Is there a way to split strings with String.split() and include the delimiters? [duplicate]

I have a multiline string which is delimited by a set of different delimiters:


I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

You can use lookahead and lookbehind, which are features of regular expressions.


And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";

public void someMethod() {
    final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));

You want to use lookarounds, and split on zero-width matches. Here are some examples:

public class SplitNDump {
    static void dump(String[] arr) {
        for (String s : arr) {
            System.out.format("[%s]", s);
    public static void main(String[] args) {
        // "[1][234][567][890]"
        // "[1][,234][,567][,890]"
        // "[1,][234,][567,][890]"
        // "[1][,][234][,][567][,][890]"

        // "[][:][a][:][bb][:][:][c][:]"
        // "[:][a][:][bb][:][:][c][:]"
        dump(":::a::::b  b::c:".split("(?=(?!^):)(?<!:)|(?!:)(?<=:)"));
        // "[:::][a][::::][b  b][::][c][:]"
        dump("a,bb:::c  d..e".split("(?!^)\\b"));
        // "[a][,][bb][:::][c][  ][d][..][e]"

        // "[Array][Index][Out][Of][Bounds][Exception]"
        // "[1234][5678][90]"

        // Split at the end of each run of letter
        dump("Boooyaaaah! Yippieeee!!".split("(?<=(?=(.)\\1(?!\\1))..)"));
        // "[Booo][yaaaa][h! Yipp][ieeee][!!]"

And yes, that is triply-nested assertion there in the last pattern.

A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):

string.replace(FullString, "," , "~,~")

Where you can replace tilda (~) with an appropriate unique delimiter.

Then if you do a split on your new delimiter then i believe you will get the desired result.