Java: Searching for a Regex that splits a text into separate words, including letters, numbers and ' between letters
If you want to match words using \w
, instead of using split you can use word boundaries and assert not '
at the left and at the right.
\b(?<!')\w+(?:'\w+)*\b(?!')
In Java
String regex = "\\b(?<!')\\w+(?:'\\w+)*\\b(?!')";
String string = "Can't but not something like: 'HEART'";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
Can't
but
not
something
like