Row_number does not work on timesheet data for same shift start date but different shift end date

I'm working on timesheet data where I have employee shift_start_date, shift_start_time, shift_end_date.

I've created a row_num column based on shift_start_time:

SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY id, shift_start_date 
                       ORDER BY shift_start_time asc) AS row_num
FROM 
    table

The data looks like this:

enter image description here

As you can see the row_num is incorrect for id 101. The employee starts the shift on '2020-10-04' but finishes it on '2020-10-05', so when ordering the rows by shift_start_time. It takes the 0:00 starttime as the the time that person has started the shift for that day and assigns row_num = 1 for that row.

I want my data to look like this, where the row_number_corrected is the correct row_number:

enter image description here

Not sure how to fix this. Thank you so much for your help.


You need to find the actual start date. Like for the last row, the actual start date is 2020-10-05

One way is to use recursive cte.

The anchor section finds the starting row. The starting row does not have another row with same shift_end_time'. Subsequent rows is join by shift_end_time = shift_start_time`

To get the actual start date, compare current row shift_start_time with next row. If next row shift_start_time is smaller, add 1 to the act_start_date

Finally concatenate act_start_date with shift_start_time to form datetime and used that for ordering

with rcte as
(
    -- Anchor
    select s.id, s.shift_start_date, s.shift_start_time, s.shift_end_time,
           act_start_date = s.shift_start_date
    from   shifts s
    where  not exists
           (
               select *
               from   shifts x
               where  x.id = s.id
               and    x.shift_start_date = s.shift_start_date
               and    x.shift_end_time   = s.shift_start_time
           )
           
    union all
    
    -- recursive
    select  s.id, s.shift_start_date, s.shift_start_time, s.shift_end_time,
            act_start_date = case when r.shift_start_time < s.shift_start_time
                                  then r.act_start_date
                                  else dateadd(day, 1, r.act_start_date)
                                  end
    from    rcte r
            inner join shifts s on  r.id = s.id
                                and r.shift_end_time = s.shift_start_time
)
select *,
       row_num = row_number() over (partition by id, shift_start_date
                                        order by convert(datetime, act_start_date) 
                                               + convert(datetime, shift_start_time))
from   rcte

demo

Result :

id shift_start_date shift_start_time shift_end_time act_start_date row_num
100 2020-10-03 09:00:00 12:00:00 2020-10-03 1
100 2020-10-03 12:00:00 13:00:00 2020-10-03 2
100 2020-10-03 13:00:00 17:30:00 2020-10-03 3
101 2020-10-04 19:00:00 23:00:00 2020-10-04 1
101 2020-10-04 23:00:00 00:00:00 2020-10-04 2
101 2020-10-04 00:00:00 06:00:00 2020-10-05 3