Random record from a database table (T-SQL)
Is there a succinct way to retrieve a random record from a sql server table?
I would like to randomize my unit test data, so am looking for a simple way to select a random id from a table. In English, the select would be "Select one id from the table where the id is a random number between the lowest id in the table and the highest id in the table."
I can't figure out a way to do it without have to run the query, test for a null value, then re-run if null.
Ideas?
Is there a succinct way to retrieve a random record from a sql server table?
Yes
SELECT TOP 1 * FROM table ORDER BY NEWID()
Explanation
A NEWID()
is generated for each row and the table is then sorted by it. The first record is returned (i.e. the record with the "lowest" GUID).
Notes
-
GUIDs are generated as pseudo-random numbers since version four:
The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.
The algorithm is as follows:
- Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.
- Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.
- Set all the other bits to randomly (or pseudo-randomly) chosen values.
—A Universally Unique IDentifier (UUID) URN Namespace - RFC 4122
The alternative
SELECT TOP 1 * FROM table ORDER BY RAND()
will not work as one would think.RAND()
returns one single value per query, thus all rows will share the same value.While GUID values are pseudo-random, you will need a better PRNG for the more demanding applications.
Typical performance is less than 10 seconds for around 1,000,000 rows — of course depending on the system. Note that it's impossible to hit an index, thus performance will be relatively limited.
On larger tables you can also use TABLESAMPLE
for this to avoid scanning the whole table.
SELECT TOP 1 *
FROM YourTable
TABLESAMPLE (1000 ROWS)
ORDER BY NEWID()
The ORDER BY NEWID
is still required to avoid just returning rows that appear first on the data page.
The number to use needs to be chosen carefully for the size and definition of table and you might consider retry logic if no row is returned. The maths behind this and why the technique is not suited to small tables is discussed here
Also try your method to get a random Id between MIN(Id) and MAX(Id) and then
SELECT TOP 1 * FROM table WHERE Id >= @yourrandomid
It will always get you one row.
If you want to select large data the best way that I know is:
SELECT * FROM Table1
WHERE (ABS(CAST(
(BINARY_CHECKSUM
(keycol1, NEWID())) as int))
% 100) < 10
Source: MSDN