java string split on all non-alphanumeric except apostrophes

So I want to split a string in java on any non-alphanumeric characters.

Currently I have been doing it like this

words= Str.split("\\W+");

However I want to keep apostrophes("'") in there. Is there any regular expression to preserve apostrophes but kick the rest of the junk? Thanks.


words = Str.split("[^\\w']+");

Just add it to the character class. \W is equivalent to [^\w], which you can then add ' to.

Do note, however, that \w also actually includes underscores. If you want to split on underscores as well, you should be using [^a-zA-Z0-9'] instead.


For basic English characters, use

words = Str.split("[^a-zA-Z0-9']+");

If you want to include English words with special characters (such as fiancé) or for languages that use non-English characters, go with

words = Str.split("[^\\p{L}0-9']+");