How to create an index to avoid sorting step used with DISTINCT?

PostgreSQL 9.2

I have the following table (tbl):

-------------------------------------------------------------
| id   |  mailing_id  |  recipient_id  |  delivery_state_id |
-------------------------------------------------------------
| PK   |   integer    |     integer    |       integer      |
-------------------------------------------------------------

Also, the I created the following index:

CREATE INDEX idx_name
  ON tbl
  USING btree
  (recipient_id);

Since, indexes in posgtresql have default sorting, I expected the query

SELECT DISTINCT recipient_id 
FROM tbl

can avoid the sort step. But running

EXPLAIN ANALYZE SELECT DISTINCT recipient_id 
FROM mailing.mailing_recipient mr

show me that it can't:

 Unique  (cost=1401370.66..1442288.31 rows=145798 width=4) (actual time=9377.410..11388.869 rows=1037472 loops=1) 
   ->  Sort  (cost=1401370.66..1421829.48 rows=8183530 width=4) (actual time=9377.408..10849.160 rows=8183160 loops=1) 
         Sort Key: recipient_id 
         Sort Method: external merge  Disk: 111968kB 
         ->  Seq Scan on tbl  (cost=0.00..126072.30 rows=8183530 width=4) (actual time=0.008..1073.771 rows=8183160 loops=1) 
 Total runtime: 11448.373 ms 

As you can see, there's still sorting.

Question: How can I create the index to avoid sorting step?


Solution 1:

Make sure your order by statement matches your indexes exactly, including NULLS LAST (or FIRST) in your queries.

Index Definition Without Specifying NULLS LAST in ORDER BY Statement With Specifying NULLS LAST on each of the ORDER BY Fields

Solution 2:

This surprises me; I would expect Postgres to be smarter than that. What happens with this version?

SELECT recipient_id 
FROM tbl
GROUP BY recipient_id;

What version of Postgres are you using? Postgres introduced index-only scans in version 9.2 (see here), which might explain the lack of use of the index. I can say that an index scan is used for distinct in 9.3.

Here is an explain in 9.3 on a similar query (select distinct totalprice from orders):

Unique  (cost=0.42..5505.62 rows=2794 width=8)
  ->  Index Only Scan using idx_orders_totalprice on orders  (cost=0.42..5023.16 rows=192983 width=8)"