How to remove all non-alpha numeric characters from a string in MySQL?
Using MySQL 8.0 or higher
Courtesy of michal.jakubeczy's answer below, replacing by Regex is now supported by MySQL:
UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')
Using MySQL 5.7 or lower
Regex isn't supported here. I had to create my own function called alphanum which stripped the chars for me:
DROP FUNCTION IF EXISTS alphanum;
DELIMITER |
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret CHAR(255) DEFAULT '';
DECLARE c CHAR(1);
IF str IS NOT NULL THEN
SET len = CHAR_LENGTH( str );
REPEAT
BEGIN
SET c = MID( str, i, 1 );
IF c REGEXP '[[:alnum:]]' THEN
SET ret=CONCAT(ret,c);
END IF;
SET i = i + 1;
END;
UNTIL i > len END REPEAT;
ELSE
SET ret='';
END IF;
RETURN ret;
END |
DELIMITER ;
Now I can do:
select 'This works finally!', alphanum('This works finally!');
and I get:
+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally |
+---------------------+---------------------------------+
1 row in set (0.00 sec)
Hurray!
From a performance point of view, (and on the assumption that you read more than you write)
I think the best way would be to pre calculate and store a stripped version of the column, This way you do the transform less.
You can then put an index on the new column and get the database to do the work for you.
SELECT teststring REGEXP '[[:alnum:]]+';
SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+';
See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says: [:character_class:]
If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp
Since MySQL 8.0 you can use regular expression to remove non alphanumeric characters from a string. There is method REGEXP_REPLACE
Here is the code to remove non-alphanumeric characters:
UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')