Merging a list of time-range tuples that have overlapping time-ranges
I have a list of tuples where each tuple is a (start-time, end-time)
. I am trying to merge all overlapping time ranges and return a list of distinct time ranges.
For example
[(1, 5), (2, 4), (3, 6)] ---> [(1,6)]
[(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)]
Here is how I implemented it.
# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ...
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b Ans: [(a,b),(c,d)]
# c---d
# c<=b<d: a-------b Ans: [(a,d)]
# c---d
# c<d<b: a-------b Ans: [(a,b)]
# c---d
#================================================
def mergeoverlapping(initialranges):
i = sorted(set([tuple(sorted(x)) for x in initialranges]))
# initialize final ranges to [(a,b)]
f = [i[0]]
for c, d in i[1:]:
a, b = f[-1]
if c<=b<d:
f[-1] = a, d
elif b<c<d:
f.append((c,d))
else:
# else case included for clarity. Since
# we already sorted the tuples and the list
# only remaining possibility is c<d<b
# in which case we can silently pass
pass
return f
I am trying to figure out if
- Is the a an built-in function in some python module that can do this more efficiently? or
- Is there a more pythonic way of accomplishing the same goal?
Your help is appreciated. Thanks!
A few ways to make it more efficient, Pythonic:
- Eliminate the
set()
construction, since the algorithm should prune out duplicates during in the main loop. - If you just need to iterate over the results, use
yield
to generate the values. - Reduce construction of intermediate objects, for example: move the
tuple()
call to the point where the final values are produced, saving you from having to construct and throw away extra tuples, and reuse a listsaved
for storing the current time range for comparison.
Code:
def merge(times):
saved = list(times[0])
for st, en in sorted([sorted(t) for t in times]):
if st <= saved[1]:
saved[1] = max(saved[1], en)
else:
yield tuple(saved)
saved[0] = st
saved[1] = en
yield tuple(saved)
data = [
[(1, 5), (2, 4), (3, 6)],
[(1, 3), (2, 4), (5, 8)]
]
for times in data:
print list(merge(times))
Sort tuples then list, if t1.right>=t2.left => merge and restart with the new list, ...
-->
def f(l, sort = True):
if sort:
sl = sorted(tuple(sorted(i)) for i in l)
else:
sl = l
if len(sl) > 1:
if sl[0][1] >= sl[1][0]:
sl[0] = (sl[0][0], sl[1][1])
del sl[1]
if len(sl) < len(l):
return f(sl, False)
return sl
The sort part: use standard sorting, it compares tuples the right way already.
sorted_tuples = sorted(initial_ranges)
The merge part. It eliminates duplicate ranges, too, so no need for a set
. Suppose you have current_tuple
and next_tuple
.
c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end:
merged_tuple = min(c_start, n_start), max(c_end, n_end)
I hope the logic is clear enough.
To peek next tuple, you can use indexed access to sorted tuples
; it's a wholly known sequence anyway.
Sort all boundaries then take all pairs where a boundary end is followed by a boundary start.
def mergeOverlapping(initialranges):
def allBoundaries():
for r in initialranges:
yield r[0], True
yield r[1], False
def getBoundaries(boundaries):
yield boundaries[0][0]
for i in range(1, len(boundaries) - 1):
if not boundaries[i][1] and boundaries[i + 1][1]:
yield boundaries[i][0]
yield boundaries[i + 1][0]
yield boundaries[-1][0]
return getBoundaries(sorted(allBoundaries()))
Hm, not that beautiful but was fun to write at least!
EDIT: Years later, after an upvote, I realised my code was wrong! This is the new version just for fun:
def mergeOverlapping(initialRanges):
def allBoundaries():
for r in initialRanges:
yield r[0], -1
yield r[1], 1
def getBoundaries(boundaries):
openrange = 0
for value, boundary in boundaries:
if not openrange:
yield value
openrange += boundary
if not openrange:
yield value
def outputAsRanges(b):
while b:
yield (b.next(), b.next())
return outputAsRanges(getBoundaries(sorted(allBoundaries())))
Basically I mark the boundaries with -1 or 1 and then sort them by value and only output them when the balance between open and closed braces is zero.