Select first row per group with groups created by 'ORDER' [duplicate]
Solution 1:
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
Solution 2:
I just learned how to use cross apply
. Here's how to use it in this scenario:
select d.DocumentID, ds.Status, ds.DateCreated
from Documents as d
cross apply
(select top 1 Status, DateCreated
from DocumentStatusLogs
where DocumentID = d.DocumentId
order by DateCreated desc) as ds
Solution 3:
I know this is an old thread but the TOP 1 WITH TIES
solutions is quite nice and might be helpful to some reading through the solutions.
select top 1 with ties
DocumentID
,Status
,DateCreated
from DocumentStatusLogs
order by row_number() over (partition by DocumentID order by DateCreated desc)
The select top 1 with ties
clause tells SQL Server that you want to return the first row per group. But how does SQL Server know how to group up the data? This is where the order by row_number() over (partition by DocumentID order by DateCreated desc
comes in. The column/columns after partition by
defines how SQL Server groups up the data. Within each group, the rows will be sorted based on the order by
columns. Once sorted, the top row in each group will be returned in the query.
More about the TOP clause can be found here.
Solution 4:
I've done some timings over the various recommendations here, and the results really depend on the size of the table involved, but the most consistent solution is using the CROSS APPLY These tests were run against SQL Server 2008-R2, using a table with 6,500 records, and another (identical schema) with 137 million records. The columns being queried are part of the primary key on the table, and the table width is very small (about 30 bytes). The times are reported by SQL Server from the actual execution plan.
Query Time for 6500 (ms) Time for 137M(ms)
CROSS APPLY 17.9 17.9
SELECT WHERE col = (SELECT MAX(COL)…) 6.6 854.4
DENSE_RANK() OVER PARTITION 6.6 907.1
I think the really amazing thing was how consistent the time was for the CROSS APPLY regardless of the number of rows involved.