Row_number does not work on timesheet data for same shift start date but different shift end date
I'm working on timesheet data where I have employee shift_start_date
, shift_start_time
, shift_end_date
.
I've created a row_num
column based on shift_start_time
:
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, shift_start_date
ORDER BY shift_start_time asc) AS row_num
FROM
table
The data looks like this:
As you can see the row_num
is incorrect for id 101. The employee starts the shift on '2020-10-04' but finishes it on '2020-10-05', so when ordering the rows by shift_start_time
. It takes the 0:00 starttime as the the time that person has started the shift for that day and assigns row_num = 1
for that row.
I want my data to look like this, where the row_number_corrected
is the correct row_number
:
Not sure how to fix this. Thank you so much for your help.
You need to find the actual start date. Like for the last row, the actual start date is 2020-10-05
One way is to use recursive cte.
The anchor section finds the starting row. The starting row does not have another row with same shift_end_time'. Subsequent rows is join by
shift_end_time = shift_start_time`
To get the actual start date, compare current row shift_start_time
with next row. If next row shift_start_time
is smaller, add 1 to the act_start_date
Finally concatenate act_start_date
with shift_start_time
to form datetime and used that for ordering
with rcte as
(
-- Anchor
select s.id, s.shift_start_date, s.shift_start_time, s.shift_end_time,
act_start_date = s.shift_start_date
from shifts s
where not exists
(
select *
from shifts x
where x.id = s.id
and x.shift_start_date = s.shift_start_date
and x.shift_end_time = s.shift_start_time
)
union all
-- recursive
select s.id, s.shift_start_date, s.shift_start_time, s.shift_end_time,
act_start_date = case when r.shift_start_time < s.shift_start_time
then r.act_start_date
else dateadd(day, 1, r.act_start_date)
end
from rcte r
inner join shifts s on r.id = s.id
and r.shift_end_time = s.shift_start_time
)
select *,
row_num = row_number() over (partition by id, shift_start_date
order by convert(datetime, act_start_date)
+ convert(datetime, shift_start_time))
from rcte
demo
Result :
id | shift_start_date | shift_start_time | shift_end_time | act_start_date | row_num |
---|---|---|---|---|---|
100 | 2020-10-03 | 09:00:00 | 12:00:00 | 2020-10-03 | 1 |
100 | 2020-10-03 | 12:00:00 | 13:00:00 | 2020-10-03 | 2 |
100 | 2020-10-03 | 13:00:00 | 17:30:00 | 2020-10-03 | 3 |
101 | 2020-10-04 | 19:00:00 | 23:00:00 | 2020-10-04 | 1 |
101 | 2020-10-04 | 23:00:00 | 00:00:00 | 2020-10-04 | 2 |
101 | 2020-10-04 | 00:00:00 | 06:00:00 | 2020-10-05 | 3 |