Remove HTML tags from record

try this solution: not tried it myself but apparently it works.

source: http://forums.mysql.com/read.php?52,177343,177985#msg-177985

   SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength int;
    WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
      BEGIN
        SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
        SET iLength = ( iEnd - iStart) + 1;
        IF iLength > 0 THEN
          BEGIN
            SET Dirty = Insert( Dirty, iStart, iLength, '');
          END;
        END IF;
      END;
    END WHILE;
    RETURN Dirty;
END;
|
DELIMITER ;
SELECT fnStripTags('this <html>is <b>a test</b>, nothing more</html>');

Here's my implementation of a strip_tags function:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set

Enjoy.


If your content always start with tags(<body> etc.)

try this:

SELECT * from table  WHERE colmn_name REGEXP  '>[^<]*mytext';

Regular expression matching is needed here, with negative look-ahead assertion: "mytext" not followed by a closing tag. MySQL still, as of 8.0, doesn't support look assertions, but MariaDB does. MariaDB query for the question:

SELECT * FROM table WHERE column_name REGEXP 'mytext(?![^<>]*>)';

Another solution for the question is to remove some / all tags before matching. It's inefficient, compared with REGEXP, but also works. In MySQL starting with 8.0 and MariaDB starting with 10.0.5, there is built-in REGEXP_REPLACE function. 'strip_html' is even the first example in MariaDB corresponding documentation page. MySQL / MariaDB query for such approach:

SELECT * FROM table WHERE REGEXP_REPLACE (column_name, '<.+?>', '') LIKE '%mytext%';

Apart from that, strings in the question mix data and representation. They shouldn't be regularly searched as it is a waste of system resources.