How to filter string for unwanted characters using regex?
Edited based on your update:
dirtyString.replaceAll("[^a-zA-Z0-9]","")
If you're using guava on your project (and if you're not, I believe you should consider it), the CharMatcher class handles this very nicely:
Your first example might be:
result = CharMatcher.WHITESPACE.removeFrom(dirtyString);
while your second might be:
result = CharMatcher.anyOf(" *#&").removeFrom(dirtyString);
// or alternatively
result = CharMatcher.noneOf(" *#&").retainFrom(dirtyString);
or if you want to be more flexible with whitespace (tabs etc), you can combine them rather than writing your own:
CharMatcher illegal = CharMatcher.WHITESPACE.or(CharMatcher.anyOf("*#&"));
result = illegal.removeFrom(dirtyString);
or you might instead specify legal characters, which depending on your requirements might be:
CharMatcher legal = CharMatcher.JAVA_LETTER; // based on Unicode char class
CharMatcher legal = CharMatcher.ASCII.and(CharMatcher.JAVA_LETTER); // only letters which are also ASCII, as your examples
CharMatcher legal = CharMatcher.inRange('a', 'z'); // lowercase only
CharMatcher legal = CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z')); // either case
followed by retainFrom(dirtyString)
as above.
Very nice, powerful API.
Use replaceAll
.