How do I convert CamelCase into human-readable names in Java?
This works with your testcases:
static String splitCamelCase(String s) {
return s.replaceAll(
String.format("%s|%s|%s",
"(?<=[A-Z])(?=[A-Z][a-z])",
"(?<=[^A-Z])(?=[A-Z])",
"(?<=[A-Za-z])(?=[^A-Za-z])"
),
" "
);
}
Here's a test harness:
String[] tests = {
"lowercase", // [lowercase]
"Class", // [Class]
"MyClass", // [My Class]
"HTML", // [HTML]
"PDFLoader", // [PDF Loader]
"AString", // [A String]
"SimpleXMLParser", // [Simple XML Parser]
"GL11Version", // [GL 11 Version]
"99Bottles", // [99 Bottles]
"May5", // [May 5]
"BFG9000", // [BFG 9000]
};
for (String test : tests) {
System.out.println("[" + splitCamelCase(test) + "]");
}
It uses zero-length matching regex with lookbehind and lookforward to find where to insert spaces. Basically there are 3 patterns, and I use String.format
to put them together to make it more readable.
The three patterns are:
UC behind me, UC followed by LC in front of me
XMLParser AString PDFLoader
/\ /\ /\
non-UC behind me, UC in front of me
MyClass 99Bottles
/\ /\
Letter behind me, non-letter in front of me
GL11 May5 BFG9000
/\ /\ /\
References
- regular-expressions.info/Lookarounds
Related questions
Using zero-length matching lookarounds to split:
- Regex split string but keep separators
- Java split is eating my characters
You can do it using org.apache.commons.lang.StringUtils
StringUtils.join(
StringUtils.splitByCharacterTypeCamelCase("ExampleTest"),
' '
);
The neat and shorter solution :
StringUtils.capitalize(StringUtils.join(StringUtils.splitByCharacterTypeCamelCase("yourCamelCaseText"), StringUtils.SPACE)); // Your Camel Case Text
If you don't like "complicated" regex's, and aren't at all bothered about efficiency, then I've used this example to achieve the same effect in three stages.
String name =
camelName.replaceAll("([A-Z][a-z]+)", " $1") // Words beginning with UC
.replaceAll("([A-Z][A-Z]+)", " $1") // "Words" of only UC
.replaceAll("([^A-Za-z ]+)", " $1") // "Words" of non-letters
.trim();
It passes all the test cases above, including those with digits.
As I say, this isn't as good as using the one regular expression in some other examples here - but someone might well find it useful.