Difference between LIKE and ~ in Postgres

Solution 1:

~ is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE, and should be used when that power is needed, but they serve different purposes.

Solution 2:

There is nothing wrong with LIKE and, IMO, no reason to favour ~ over it. Rather the opposite. LIKE is SQL-standard. So is SIMILAR TO, but it isn't widely supported. PostgreSQL's ~ operator (or posix regular expression matching operator) isn't SQL standard.

For that reason, I prefer to use LIKE where it's expressive enough and I only use ~ when I need the power of full regular expressions. If I ever need to port databases it's one less thing that'll hurt. I've tended to use SIMILAR TO when LIKE isn't powerful enough, but after Erwin's comments I think I'll stop doing that and use ~ when LIKE doesn't do the job.

Also, PostgreSQL can use a b-tree index for prefix searches (eg LIKE 'TEST%') with LIKE or SIMILAR TO if the database is in the C locale or the index has text_pattern_ops. Contrary to what I wrote earlier, Pg can also use such an index for a left-anchored posix regex, it just needs an explicit '^TEST.*' so the regex can only match from the beginning. My post earlier incorrectly stated that ~ couldn't use an index for a prefix search. With that difference eliminated it's really down to whether you want to stick to standard compliant features where possible or not.

See this demo SQLFiddle; note the different execution plans. Note the difference between ~ '1234.*' and ~ '^1234.*'.

Given sample data:

create table test (
   blah text
);
insert into test (blah)  select x::text from generate_series(1,10000) x;
create index test_blah_txtpat_idx ON test(blah text_pattern_ops);

note that ~ uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan) because it has no alternative, while LIKE uses the index. However, a corrected ~ with a left anchor uses the index too:

regress=# SET enable_seqscan = 'f';
SET
regress=# explain select 1 from test where blah ~ '12.*';
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on test  (cost=10000000000.00..10000000118.69 rows=2122 width=0)
   Filter: (blah ~ '12.*'::text)
(2 rows)
regress=# explain select 1 from test where blah like '12%';
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=4.55..46.76 rows=29 width=0)
   Filter: (blah ~~ '12%'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..4.54 rows=29 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
regress=# explain select 1 from test where blah ~ '^12.*';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=5.28..51.53 rows=101 width=0)
   Filter: (blah ~ '^12.*'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..5.25 rows=100 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)