Find highest-ranked pieces of intersecting rows in SQL

This probably falls into that "I could google it if only I knew the proper name for it" category, but here goes.

I have a table of intersecting activity timeframes, and a ranking for those activities. I need to get remaining timeframes for each row after each higher-ranked intersection has been removed.

I've tried joining each row with each intersecting row, but adjusting the start and end times based on each individual intersection doesn't account for higher-level intersections. Here's what I got to before I decided I'm probably approaching this wrong conceptually.

select
    Activity
    ,rank
    ,case
        when b.startTime <= a.startTime then b.endTime 
        when b.endtime >= a.endTime then b.starttime
        else a.startTime
        end as starttimeAdj
    ,case
        when b.starttime <= b.startTime then a.endtime
        when b.endtime >= a.endTime then b.starttime
        else a.endTime
        end as endtimeAdj
from myTable as a
left join myTable as b
    on b.startTime< a.endTime
    and b.endTime> a.startTime
    and b.Activity!= a.Activity
    and b.Rank<a.rank
where
    b.segID is null or
    not (b.starttime <= a.startTime and b.endtime >= a.endTime)
order by a.starttime,a.rank,b.starttime,b.rank

Here's an example. This is the starting data:

Activity    rank    StartTime   EndTime
Meeting     1       8:00        9:00
Startup     2       8:00        8:10
Shift       4       8:00        19:00
Break1      3       10:15       10:30
Break2      3       17:00       17:15

This is what I'm trying to get to:

Activity    rank    StartTime   EndTime
Meeting     1       8:00        9:00
Shift       4       9:00        10:15
Break1      3       10:15       10:30
Shift       4       10:30       17:00
Break2      3       17:00       17:15
Shift       4       17:15       19:00

The Startup activity is gone because it is fully encompassed by a higher-ranked event. The Shift activity, as the lowest rank, has been fragmented by each thing that intersects it, with only the non-intersecting periods remaining.

Visual representation by rank:

<--Meeting-->
<Startup>           
                      <Break1>                                            <Break2>
<-----------------------------------------------Shift------------------------------------>

Becomes

<--Meeting--><--Shift-><Break><-----------------Shift--------------------><Break><-Shift->

Solution 1:

I found an edge (pun intended) case (!), as I decided to use this SQL for my own purposes.

Although the first solution works for the given data, I found a problem when two activities are back to back, with end time of the 1st activity equal to the start time of the next.

I'll add my correction here, and leave the original answer / SQL at the end:

Updated fiddle with corrected SQL

Here's the updated data that causes a problem:

INSERT INTO segments VALUES
  (1,      'Meeting' , 1,       800,       900)
, (2,      'Startup' , 2,       800,       810)
, (3,      'Shift'   , 4,       800,      1900)
, (4,      'Break1'  , 3,      1015,      1030)
, (5,      'Break2'  , 3,      1700,      1715)
, (6,      'Meeting2', 1,       900,       915)  -- added a 15 minute meeting at 0900
;

The problem is when leading and trailing edges of activities have the same time. In those cases, we need to rank those rows in the intermediate result so that the trailing edge can be pruned from the result, leaving the leading edge of the next activity to drive our final result.

Here's the new solution (see the new fiddle to see the problem and correction):

WITH edge1 AS (
        SELECT t2.Activity
             , t2.rank
             , t1.StartTime
             , t2.segID
             , ROW_NUMBER() OVER (PARTITION BY t1.StartTime ORDER BY t2.rank) AS rnk
          FROM      segments AS t1
          LEFT JOIN segments AS t2
            ON t1.StartTime >= t2.StartTime
           AND t1.StartTime <  t2.EndTime
     )
   , edge2 AS (
        SELECT t2.Activity
             , t2.rank
             , t1.EndTime
             , t2.segID
             , ROW_NUMBER() OVER (PARTITION BY t1.EndTime ORDER BY t2.rank) AS rnk
          FROM      segments AS t1
          LEFT JOIN segments AS t2
            ON t1.EndTime > t2.StartTime
           AND t1.EndTime < t2.EndTime
           AND t1.segID <> t2.segID
     )
   , xall1 AS (
        SELECT 1 AS edge, e.* FROM edge1 AS e WHERE rnk = 1
         UNION
        SELECT 2 AS edge, e.* FROM edge2 AS e WHERE rnk = 1
     )
   , xall AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY StartTime ORDER BY edge) AS ernk2
          FROM xall1
     )
   , xprune AS (
        SELECT *
             , CASE WHEN LAG(segID) OVER (ORDER BY StartTime) = segID THEN 1 ELSE 0 END AS prune
          FROM xall
         WHERE ernk2 = 1
     )
   , final AS (
        SELECT *
             , LEAD(StartTime) OVER (ORDER BY StartTime) AS EndTime
          FROM xprune
         WHERE prune = 0
     )
SELECT Activity, rank, StartTime, EndTime
  FROM final
 WHERE Activity IS NOT NULL
 ORDER BY StartTime
;

Correct result with the new meeting2:

Activity	rank	StartTime	EndTime
Meeting	1	800	900
Meeting2	1	900	915
Shift	4	915	1015
Break1	3	1015	1030
Shift	4	1030	1700
Break2	3	1700	1715
Shift	4	1715	1900

Original answer / SQL below:

I'll bite. Here's something to try. I treated the leading edge and trailing edge of each activity slightly differently, since we want to include the current activity when handling the leading edge results, and exclude the current activity when handling the trailing edge results.

CTE term	Description
edge1	Leading edge start time rank of overlapping rows
edge2	Trailing edge end time rank of overlapping rows
xall	Just UNION the leading and trailing edge times of the best ranked rows
xprune	Determine which adjacent rows (having the same segID) can be pruned
final	Reduce to the final set of rows while calculating the new end times

The last query expression selects the columns of interest and removes the rows which indicate gaps in activity.

Fiddle for SQL Server

The SQL:

WITH edge1 AS (
        SELECT t2.Activity
             , t2.rank
             , t1.StartTime
             , t2.segID
             , ROW_NUMBER() OVER (PARTITION BY t1.StartTime ORDER BY t2.rank) AS rnk
          FROM      segments AS t1
          LEFT JOIN segments AS t2
            ON t1.StartTime >= t2.StartTime
           AND t1.StartTime <  t2.EndTime
     )
   , edge2 AS (
        SELECT t2.Activity
             , t2.rank
             , t1.EndTime
             , t2.segID
             , ROW_NUMBER() OVER (PARTITION BY t1.EndTime ORDER BY t2.rank) AS rnk
          FROM      segments AS t1
          LEFT JOIN segments AS t2
            ON t1.EndTime > t2.StartTime
           AND t1.EndTime < t2.EndTime
           AND t1.segID <> t2.segID
     )
   , xall AS (
        SELECT * FROM edge1 WHERE rnk = 1
         UNION
        SELECT * FROM edge2 WHERE rnk = 1
     )
   , xprune AS (
        SELECT *
             , CASE WHEN LAG(segID) OVER (ORDER BY StartTime) = segID THEN 1 ELSE 0 END AS prune
          FROM xall
     )
   , final AS (
        SELECT *
             , LEAD(StartTime) OVER (ORDER BY StartTime) AS EndTime
          FROM xprune
         WHERE prune = 0
     )
SELECT Activity, rank, StartTime, EndTime
  FROM final
 WHERE Activity IS NOT NULL
 ORDER BY StartTime
;

The result:

+----------+------+-----------+---------+
| Activity | rank | StartTime | EndTime |
+----------+------+-----------+---------+
| Meeting  |    1 |       800 |     900 |
| Shift    |    4 |       900 |    1015 |
| Break1   |    3 |      1015 |    1030 |
| Shift    |    4 |      1030 |    1700 |
| Break2   |    3 |      1700 |    1715 |
| Shift    |    4 |      1715 |    1900 |
+----------+------+-----------+---------+

Find highest-ranked pieces of intersecting rows in SQL

Solution 1:

Related

Recent Posts