Which is faster — INSTR or LIKE?

If your goal is to test if a string exists in a MySQL column (of type 'varchar', 'text', 'blob', etc) which of the following is faster / more efficient / better to use, and why?

Or, is there some other method that tops either of these?

INSTR( columnname, 'mystring' ) > 0

vs

columnname LIKE '%mystring%'

FULLTEXT searches are absolutely going to be faster, as kibibu noted in the comments above.

However:

mysql> select COUNT(ID) FROM table WHERE INSTR(Name,'search') > 0;
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

mysql> select COUNT(ID) FROM table WHERE Name LIKE '%search%';
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

In my tests, they perform exactly the same. They are both case-insensitive, and generally they perform full-table scans, a general no-no when dealing with high-performance MySQL.

Unless you are doing a prefix search on an indexed column:

mysql> select COUNT(ID) FROM table WHERE Name LIKE 'search%';
+-----------+
| COUNT(ID) |
+-----------+
|         7 | 
+-----------+
1 row in set (3.88 sec)

In which case, the LIKE with only a suffix wildcard is much faster.


MySQL - INSTR vs LOCATE vs LIKE vs REGEXP

For me the INSTR and LOCATE performed the fastest:

# 5.074 sec
SELECT BENCHMARK(100000000,INSTR('foobar','foo'));

# 5.086 sec
SELECT BENCHMARK(100000000,LOCATE('foo','foobar')); 

# 8.990 sec
SELECT BENCHMARK(100000000,'foobar' LIKE '%foo%');

# 14.433 sec
SELECT BENCHMARK(100000000,'foobar' REGEXP 'foo'); 

# 5.5.35-0ubuntu0.12.10.2 
SELECT @@version;

In the case of a "front wilcard" (i.e. a "LIKE '%...'" predicate) as seems to be the case here, INSTR and LIKE should perform roughly the same.

When the wildcard is not a "front wildcard", the LIKE approach should be faster, unless the wildcard is not very selective.

The reason why the type of wildcard and its selectivity matter is that a predicate with INSTR() will systematically result in a table scan (SQL cannot make any assumptions about the semantics of INSTR), whereby SQL can leverage its understanding of the semantics of the LIKE predicate to maybe use an index to help it only test a reduced set of possible matches.

As suggested in comment under the question itself, a Full Text index will be much faster. The difference depends on the specific distribution of words within the text, and also the overall table size, etc. but expect anything from twice as fast to maybe 10 times as fast.

A possible downside of using at fulltext index, in addition to the general overhead for creating such an index, is that unless one is very careful in configuring this index (ex: defining the stop word list, using specific search syntax to avoid inflectional forms and the like...), there may be cases where the results provided by FullText will not be as expected. For example, searching for a "SAW" (a tool to cut wood), one may get a lot of hits for records including the verb "to see", in its various conjugated forms.
Of course, these linguistic-aware features of fulltext indexes can typically be overridden and also one may consider that such features are effectively a advantage, not a drawback. I just mention this here since we're comparing this to a plain wildcard search.


There is little to add to razzed's test. But apparently using regexp does incur a much heavier processing load, unlike what Seth points out in his comment.

The following tests assume that you set query_caching to On in my.ini

query_cache_type = 1
query_cache_size = 64M

Tests

  • The timings show the average performance, out of three measurements (with the cache cleared intermittently):

  • LIKE

    SELECT * FROM `domain_model_offers` WHERE `description` LIKE '%inform%' LIMIT 0 , 30
    

    Initial: 0.0035s
    Cached: 0.0005s

  • REGEXP

    SELECT * FROM `domain_model_offers` WHERE `description` REGEXP 'inform' LIMIT 0 , 30
    

    Initial: 0.01s
    Cached: 0.0004s

Result

LIKE or INSTR is definitely faster than REGEXP.

Though minimal, the cache timing difference is probably sufficient to warrant further investigation.

On a probably configured MySQL system, fulltext indexing should generally be always faster or at least on par with a nonindexed search. So use indexing, especially on long human language texts, regardless of intermittent markup code.