Find highest-ranked pieces of intersecting rows in SQL
This probably falls into that "I could google it if only I knew the proper name for it" category, but here goes.
I have a table of intersecting activity timeframes, and a ranking for those activities. I need to get remaining timeframes for each row after each higher-ranked intersection has been removed.
I've tried joining each row with each intersecting row, but adjusting the start and end times based on each individual intersection doesn't account for higher-level intersections. Here's what I got to before I decided I'm probably approaching this wrong conceptually.
select
Activity
,rank
,case
when b.startTime <= a.startTime then b.endTime
when b.endtime >= a.endTime then b.starttime
else a.startTime
end as starttimeAdj
,case
when b.starttime <= b.startTime then a.endtime
when b.endtime >= a.endTime then b.starttime
else a.endTime
end as endtimeAdj
from myTable as a
left join myTable as b
on b.startTime< a.endTime
and b.endTime> a.startTime
and b.Activity!= a.Activity
and b.Rank<a.rank
where
b.segID is null or
not (b.starttime <= a.startTime and b.endtime >= a.endTime)
order by a.starttime,a.rank,b.starttime,b.rank
Here's an example. This is the starting data:
Activity rank StartTime EndTime
Meeting 1 8:00 9:00
Startup 2 8:00 8:10
Shift 4 8:00 19:00
Break1 3 10:15 10:30
Break2 3 17:00 17:15
This is what I'm trying to get to:
Activity rank StartTime EndTime
Meeting 1 8:00 9:00
Shift 4 9:00 10:15
Break1 3 10:15 10:30
Shift 4 10:30 17:00
Break2 3 17:00 17:15
Shift 4 17:15 19:00
The Startup activity is gone because it is fully encompassed by a higher-ranked event. The Shift activity, as the lowest rank, has been fragmented by each thing that intersects it, with only the non-intersecting periods remaining.
Visual representation by rank:
<--Meeting-->
<Startup>
<Break1> <Break2>
<-----------------------------------------------Shift------------------------------------>
Becomes
<--Meeting--><--Shift-><Break><-----------------Shift--------------------><Break><-Shift->
Solution 1:
I found an edge (pun intended) case (!), as I decided to use this SQL for my own purposes.
Although the first solution works for the given data, I found a problem when two activities are back to back, with end time of the 1st activity equal to the start time of the next.
I'll add my correction here, and leave the original answer / SQL at the end:
Updated fiddle with corrected SQL
Here's the updated data that causes a problem:
INSERT INTO segments VALUES
(1, 'Meeting' , 1, 800, 900)
, (2, 'Startup' , 2, 800, 810)
, (3, 'Shift' , 4, 800, 1900)
, (4, 'Break1' , 3, 1015, 1030)
, (5, 'Break2' , 3, 1700, 1715)
, (6, 'Meeting2', 1, 900, 915) -- added a 15 minute meeting at 0900
;
The problem is when leading and trailing edges of activities have the same time. In those cases, we need to rank those rows in the intermediate result so that the trailing edge can be pruned from the result, leaving the leading edge of the next activity to drive our final result.
Here's the new solution (see the new fiddle to see the problem and correction):
WITH edge1 AS (
SELECT t2.Activity
, t2.rank
, t1.StartTime
, t2.segID
, ROW_NUMBER() OVER (PARTITION BY t1.StartTime ORDER BY t2.rank) AS rnk
FROM segments AS t1
LEFT JOIN segments AS t2
ON t1.StartTime >= t2.StartTime
AND t1.StartTime < t2.EndTime
)
, edge2 AS (
SELECT t2.Activity
, t2.rank
, t1.EndTime
, t2.segID
, ROW_NUMBER() OVER (PARTITION BY t1.EndTime ORDER BY t2.rank) AS rnk
FROM segments AS t1
LEFT JOIN segments AS t2
ON t1.EndTime > t2.StartTime
AND t1.EndTime < t2.EndTime
AND t1.segID <> t2.segID
)
, xall1 AS (
SELECT 1 AS edge, e.* FROM edge1 AS e WHERE rnk = 1
UNION
SELECT 2 AS edge, e.* FROM edge2 AS e WHERE rnk = 1
)
, xall AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY StartTime ORDER BY edge) AS ernk2
FROM xall1
)
, xprune AS (
SELECT *
, CASE WHEN LAG(segID) OVER (ORDER BY StartTime) = segID THEN 1 ELSE 0 END AS prune
FROM xall
WHERE ernk2 = 1
)
, final AS (
SELECT *
, LEAD(StartTime) OVER (ORDER BY StartTime) AS EndTime
FROM xprune
WHERE prune = 0
)
SELECT Activity, rank, StartTime, EndTime
FROM final
WHERE Activity IS NOT NULL
ORDER BY StartTime
;
Correct result with the new meeting2:
Activity | rank | StartTime | EndTime |
---|---|---|---|
Meeting | 1 | 800 | 900 |
Meeting2 | 1 | 900 | 915 |
Shift | 4 | 915 | 1015 |
Break1 | 3 | 1015 | 1030 |
Shift | 4 | 1030 | 1700 |
Break2 | 3 | 1700 | 1715 |
Shift | 4 | 1715 | 1900 |
Original answer / SQL below:
I'll bite. Here's something to try. I treated the leading edge and trailing edge of each activity slightly differently, since we want to include the current activity when handling the leading edge results, and exclude the current activity when handling the trailing edge results.
CTE term | Description |
---|---|
edge1 | Leading edge start time rank of overlapping rows |
edge2 | Trailing edge end time rank of overlapping rows |
xall | Just UNION the leading and trailing edge times of the best ranked rows |
xprune | Determine which adjacent rows (having the same segID) can be pruned |
final | Reduce to the final set of rows while calculating the new end times |
The last query expression selects the columns of interest and removes the rows which indicate gaps in activity.
Fiddle for SQL Server
The SQL:
WITH edge1 AS (
SELECT t2.Activity
, t2.rank
, t1.StartTime
, t2.segID
, ROW_NUMBER() OVER (PARTITION BY t1.StartTime ORDER BY t2.rank) AS rnk
FROM segments AS t1
LEFT JOIN segments AS t2
ON t1.StartTime >= t2.StartTime
AND t1.StartTime < t2.EndTime
)
, edge2 AS (
SELECT t2.Activity
, t2.rank
, t1.EndTime
, t2.segID
, ROW_NUMBER() OVER (PARTITION BY t1.EndTime ORDER BY t2.rank) AS rnk
FROM segments AS t1
LEFT JOIN segments AS t2
ON t1.EndTime > t2.StartTime
AND t1.EndTime < t2.EndTime
AND t1.segID <> t2.segID
)
, xall AS (
SELECT * FROM edge1 WHERE rnk = 1
UNION
SELECT * FROM edge2 WHERE rnk = 1
)
, xprune AS (
SELECT *
, CASE WHEN LAG(segID) OVER (ORDER BY StartTime) = segID THEN 1 ELSE 0 END AS prune
FROM xall
)
, final AS (
SELECT *
, LEAD(StartTime) OVER (ORDER BY StartTime) AS EndTime
FROM xprune
WHERE prune = 0
)
SELECT Activity, rank, StartTime, EndTime
FROM final
WHERE Activity IS NOT NULL
ORDER BY StartTime
;
The result:
+----------+------+-----------+---------+
| Activity | rank | StartTime | EndTime |
+----------+------+-----------+---------+
| Meeting | 1 | 800 | 900 |
| Shift | 4 | 900 | 1015 |
| Break1 | 3 | 1015 | 1030 |
| Shift | 4 | 1030 | 1700 |
| Break2 | 3 | 1700 | 1715 |
| Shift | 4 | 1715 | 1900 |
+----------+------+-----------+---------+