Regex to remove all special characters from string?
It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:
tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");
You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:
"TRA-12:123"
"TRA-121:23"
This should do it:
[^a-zA-Z0-9]
Basically it matches all non-alphanumeric characters.
[^a-zA-Z0-9]
is a character class matches any non-alphanumeric characters.
Alternatively, [^\w\d]
does the same thing.
Usage:
string regExp = "[^\w\d]";
string tmp = Regex.Replace(n, regExp, "");
You can use:
string regExp = "\\W";
This is equivalent to Daniel's "[^a-zA-Z0-9]
"
\W matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
.
For my purposes I wanted all English ASCII chars, so this worked.
html = Regex.Replace(html, "[^\x00-\x80]+", "")