How to remove non-alphanumeric characters?
I need to remove all characters from a string which aren't in a-z A-Z 0-9
set or are not spaces.
Does anyone have a function to do this?
Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
For unicode characters, it is :
preg_replace("/[^[:alnum:][:space:]]/u", '', $string);
Regular expression is your answer.
$str = preg_replace('/[^a-z\d ]/i', '', $str);
- The
i
stands for case insensitive. -
^
means, does not start with. -
\d
matches any digit. -
a-z
matches all characters betweena
andz
. Because of thei
parameter you don't have to specifya-z
andA-Z
. - After
\d
there is a space, so spaces are allowed in this regex.
If you need to support other languages, instead of the typical A-Z, you can use the following:
preg_replace('/[^\p{L}\p{N} ]+/', '', $string);
-
[^\p{L}\p{N} ]
defines a negated (It will match a character that is not defined) character class of:-
\p{L}
: a letter from any language. -
\p{N}
: a numeric character in any script. -
: a space character.
-
-
+
greedily matches the character class between 1 and unlimited times.
This will preserve letters and numbers from other languages and scripts as well as A-Z:
preg_replace('/[^\p{L}\p{N} ]+/', '', 'hello-world'); // helloworld
preg_replace('/[^\p{L}\p{N} ]+/', '', 'abc@~#123-+=öäå'); // abc123öäå
preg_replace('/[^\p{L}\p{N} ]+/', '', '你好世界!@£$%^&*()'); // 你好世界
Note: This is a very old, but still relevant question. I am answering purely to provide supplementary information that may be useful to future visitors.
here's a really simple regex for that:
\W|_
and used as you need it (with a forward /
slash delimiter).
preg_replace("/\W|_/", '', $string);
Test it here with this great tool that explains what the regex is doing:
http://www.regexr.com/