sql query distinct with Row_Number

I am fighting with the distinct keyword in sql. I just want to display all row numbers of unique (distinct) values in a column & so I tried:

SELECT DISTINCT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNum
FROM table
WHERE fid = 64

however the below code giving me the distinct values:

SELECT distinct id FROM table WHERE fid = 64

but when tried it with Row_Number.
then it is not working.


Solution 1:

This can be done very simple, you were pretty close already

SELECT distinct id, DENSE_RANK() OVER (ORDER BY  id) AS RowNum
FROM table
WHERE fid = 64

Solution 2:

Use this:

SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS RowNum FROM
    (SELECT DISTINCT id FROM table WHERE fid = 64) Base

and put the "output" of a query as the "input" of another.

Using CTE:

; WITH Base AS (
    SELECT DISTINCT id FROM table WHERE fid = 64
)

SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS RowNum FROM Base

The two queries should be equivalent.

Technically you could

SELECT DISTINCT id, ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS RowNum 
    FROM table
    WHERE fid = 64

but if you increase the number of DISTINCT fields, you have to put all these fields in the PARTITION BY, so for example

SELECT DISTINCT id, description,
    ROW_NUMBER() OVER (PARTITION BY id, description ORDER BY id) AS RowNum 
    FROM table
    WHERE fid = 64

I even hope you comprehend that you are going against standard naming conventions here, id should probably be a primary key, so unique by definition, so a DISTINCT would be useless on it, unless you coupled the query with some JOINs/UNION ALL...

Solution 3:

This article covers an interesting relationship between ROW_NUMBER() and DENSE_RANK() (the RANK() function is not treated specifically). When you need a generated ROW_NUMBER() on a SELECT DISTINCT statement, the ROW_NUMBER() will produce distinct values before they are removed by the DISTINCT keyword. E.g. this query

SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

... might produce this result (DISTINCT has no effect):

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

Whereas this query:

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

... produces what you probably want in this case:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

Note that the ORDER BY clause of the DENSE_RANK() function will need all other columns from the SELECT DISTINCT clause to work properly.

All three functions in comparison

Using PostgreSQL / Sybase / SQL standard syntax (WINDOW clause):

SELECT
  v,
  ROW_NUMBER() OVER (window) row_number,
  RANK()       OVER (window) rank,
  DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v

... you'll get:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+

Solution 4:

Using DISTINCT causes issues as you add fields and it can also mask problems in your select. Use GROUP BY as an alternative like this:

SELECT id
      ,ROW_NUMBER() OVER (ORDER BY  id) AS RowNum
  FROM table
 where fid = 64
 group by id

Then you can add other interesting information from your select like this:

,count(*) as thecount

or

,max(description) as description