RegEx to split camelCase or TitleCase (advanced)

Solution 1:

The following regex works for all of the above examples:

public static void main(String[] args)
{
    for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {
        System.out.println(w);
    }
}   

It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".

The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.

Solution 2:

It seems you are making this more complicated than it needs to be. For camelCase, the split location is simply anywhere an uppercase letter immediately follows a lowercase letter:

(?<=[a-z])(?=[A-Z])

Here is how this regex splits your example data:

  • value -> value
  • camelValue -> camel / Value
  • TitleValue -> Title / Value
  • VALUE -> VALUE
  • eclipseRCPExt -> eclipse / RCPExt

The only difference from your desired output is with the eclipseRCPExt, which I would argue is correctly split here.

Addendum - Improved version

Note: This answer recently got an upvote and I realized that there is a better way...

By adding a second alternative to the above regex, all of the OP's test cases are correctly split.

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

Here is how the improved regex splits the example data:

  • value -> value
  • camelValue -> camel / Value
  • TitleValue -> Title / Value
  • VALUE -> VALUE
  • eclipseRCPExt -> eclipse / RCP / Ext

Edit:20130824 Added improved version to handle RCPExt -> RCP / Ext case.

Solution 3:

Another solution would be to use a dedicated method in commons-lang: StringUtils#splitByCharacterTypeCamelCase