Difference between LIKE and ~ in Postgres
Solution 1:
~
is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE
, and should be used when that power is needed, but they serve different purposes.
Solution 2:
There is nothing wrong with LIKE
and, IMO, no reason to favour ~
over it. Rather the opposite. LIKE
is SQL-standard. So is SIMILAR TO
, but it isn't widely supported. PostgreSQL's ~ operator
(or posix regular expression matching operator) isn't SQL standard.
For that reason, I prefer to use LIKE
where it's expressive enough and I only use ~
when I need the power of full regular expressions. If I ever need to port databases it's one less thing that'll hurt. I've tended to use SIMILAR TO
when LIKE
isn't powerful enough, but after Erwin's comments I think I'll stop doing that and use ~
when LIKE
doesn't do the job.
Also, PostgreSQL can use a b-tree index for prefix searches (eg LIKE 'TEST%'
) with LIKE
or SIMILAR TO
if the database is in the C
locale or the index has text_pattern_ops
. Contrary to what I wrote earlier, Pg can also use such an index for a left-anchored posix regex, it just needs an explicit '^TEST.*' so the regex can only match from the beginning. My post earlier incorrectly stated that ~
couldn't use an index for a prefix search. With that difference eliminated it's really down to whether you want to stick to standard compliant features where possible or not.
See this demo SQLFiddle; note the different execution plans. Note the difference between ~ '1234.*'
and ~ '^1234.*'
.
Given sample data:
create table test (
blah text
);
insert into test (blah) select x::text from generate_series(1,10000) x;
create index test_blah_txtpat_idx ON test(blah text_pattern_ops);
note that ~
uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan
) because it has no alternative, while LIKE
uses the index. However, a corrected ~
with a left anchor uses the index too:
regress=# SET enable_seqscan = 'f';
SET
regress=# explain select 1 from test where blah ~ '12.*';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on test (cost=10000000000.00..10000000118.69 rows=2122 width=0)
Filter: (blah ~ '12.*'::text)
(2 rows)
regress=# explain select 1 from test where blah like '12%';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4.55..46.76 rows=29 width=0)
Filter: (blah ~~ '12%'::text)
-> Bitmap Index Scan on test_blah_txtpat_idx (cost=0.00..4.54 rows=29 width=0)
Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
regress=# explain select 1 from test where blah ~ '^12.*';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=5.28..51.53 rows=101 width=0)
Filter: (blah ~ '^12.*'::text)
-> Bitmap Index Scan on test_blah_txtpat_idx (cost=0.00..5.25 rows=100 width=0)
Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)