Regex split numbers and letter groups without spaces

If I have a string like "11E12C108N" which is a concatenation of letter groups and digit groups, how do I split them without a delimiter space character inbetween?

For example, I want the resulting split to be:

tokens[0] = "11"
tokens[1] = "E"
tokens[2] = "12"
tokens[3] = "C"
tokens[4] = "108"
tokens[5] = "N"

I have this right now.

public static void main(String[] args) {

    String stringToSplit = "11E12C108N";

    Pattern pattern = Pattern.compile("\\d+\\D+");
    Matcher matcher = pattern.matcher(stringToSplit);

    while (matcher.find()) {
        System.out.println(matcher.group());
    }
}

Which gives me:

11E
12C
108N

Can I make the original regex do a complete split in one go? Instead of having to run the regex again on the intermediate tokens?


Solution 1:

Use the following regex, and get a list of all matches. That will be what you are looking for.

\d+|\D+

In Java, I think the code would look something like this:

Matcher matcher = Pattern.compile("\\d+|\\D+").matcher(theString);
while (matcher.find())
{
    // append matcher.group() to your list
}

Solution 2:

You can also use "look around" in split regex

String stringToSplit = "11E12C108N";
String[] tokens = stringToSplit .split("(?<=\\d)(?=\\D)|(?=\\d)(?<=\\D)");
System.out.println(Arrays.toString(tokens));

out [11, E, 12, C, 108, N]

Idea is to split in places which are between digit (\d) and non-digit (\D). In other words it is place (empty string) which have:

  • digit before (?<=\d) and non-digit after it (?=\D)
  • non-digit before (?<=\D) and digit after it (?=\d)

More info about (?<=..) and (?=..) (and few more) you can find at http://www.regular-expressions.info/lookaround.html