Efficient way to find missing elements in an integer sequence

If the input sequence is sorted, you could use sets here. Take the start and end values from the input list:

def missing_elements(L):
    start, end = L[0], L[-1]
    return sorted(set(range(start, end + 1)).difference(L))

This assumes Python 3; for Python 2, use xrange() to avoid building a list first.

The sorted() call is optional; without it a set() is returned of the missing values, with it you get a sorted list.

Demo:

>>> L = [10,11,13,14,15,16,17,18,20]
>>> missing_elements(L)
[12, 19]

Another approach is by detecting gaps between subsequent numbers; using an older itertools library sliding window recipe:

from itertools import islice, chain

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

def missing_elements(L):
    missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
    return list(missing)

This is a pure O(n) operation, and if you know the number of missing items, you can make sure it only produces those and then stops:

def missing_elements(L, count):
    missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
    return list(islice(missing, 0, count))

This will handle larger gaps too; if you are missing 2 items at 11 and 12, it'll still work:

>>> missing_elements([10, 13, 14, 15], 2)
[11, 12]

and the above sample only had to iterate over [10, 13] to figure this out.

Assuming that L is a list of integers with no duplicates, you can infer that the part of the list between start and index is completely consecutive if and only if L[index] == L[start] + (index - start) and similarly with index and end is completely consecutive if and only if L[index] == L[end] - (end - index). This combined with splitting the list into two recursively gives a sublinear solution.

# python 3.3 and up, in older versions, replace "yield from" with yield loop

def missing_elements(L, start, end):
    if end - start <= 1: 
        if L[end] - L[start] > 1:
            yield from range(L[start] + 1, L[end])
        return

    index = start + (end - start) // 2

    # is the lower half consecutive?
    consecutive_low =  L[index] == L[start] + (index - start)
    if not consecutive_low:
        yield from missing_elements(L, start, index)

    # is the upper part consecutive?
    consecutive_high =  L[index] == L[end] - (end - index)
    if not consecutive_high:
        yield from missing_elements(L, index, end)

def main():
    L = [10,11,13,14,15,16,17,18,20]
    print(list(missing_elements(L,0,len(L)-1)))
    L = range(10, 21)
    print(list(missing_elements(L,0,len(L)-1)))

main()

missingItems = [x for x in complete_list if not x in L]

Using collections.Counter:

from collections import Counter

dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20])
print([i for i in range(10, 20) if dic[i] == 0])

Output:

[12, 19]

Efficient way to find missing elements in an integer sequence

Related

Recent Posts