quick random row selection in Postgres
I have a table in postgres that contains couple of millions of rows. I have checked on the internet and I found the following
SELECT myid FROM mytable ORDER BY RANDOM() LIMIT 1;
It works, but it's really slow... is there another way to make that query, or a direct way to select a random row without reading all the table? By the way 'myid' is an integer but it can be an empty field.
Solution 1:
You might want to experiment with OFFSET
, as in
SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;
The N
is the number of rows in mytable
. You may need to first do a SELECT COUNT(*)
to figure out the value of N
.
Update (by Antony Hatchkins)
You must use floor
here:
SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;
Consider a table of 2 rows; random()*N
generates 0 <= x < 2
and for example SELECT myid FROM mytable OFFSET 1.7 LIMIT 1;
returns 0 rows because of implicit rounding to nearest int.
Solution 2:
PostgreSQL 9.5 introduced a new approach for much faster sample selection: TABLESAMPLE
The syntax is
SELECT * FROM my_table TABLESAMPLE BERNOULLI(percentage);
SELECT * FROM my_table TABLESAMPLE SYSTEM(percentage);
This is not the optimal solution if you want only one row selected, because you need to know the COUNT of the table to calculate the exact percentage.
To avoid a slow COUNT and use fast TABLESAMPLE for tables from 1 row to billions of rows, you can do:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.000001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.00001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.0001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.001) LIMIT 1;
...
This might not look so elegant, but probably is faster than any of the other answers.
To decide whether you want to use BERNULLI oder SYSTEM, read about the difference at http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5-2/