Find the first missing positive integer in Python

Solution 1:

look to me that doing it in both O(n) and constant space is kind of impossible. (I stand corrected, the link of Rockybilly give such solution)

To do it in constant space one is kind of forced to sort the list and that is O(n log n) for most sorting algorithms, and the ones here looks like insertion sort which is in average O(n2)

So FWIW I choose to discard the constant space in order to do it as close to O(n) as I can think of which give me a 2n solution (in the worse case scenario, in average is n+k) (which in big O notation is still O(n) )

def firstMissingSince(sequence, start=1):
    uniques = set()
    maxitem = start-1
    for e in sequence:
        if e >= start:
            uniques.add(e)
            if e > maxitem:
                maxitem = e
    return next( x for x in range(start, maxitem+2) if x not in uniques )

(version 2 bellow )

I could have used set(sequence), and max(sequence) but both are O(n), so I combined them in the same loop, and I use set for two reason: first is a potential reduction in space by ignoring duplicates and in the same way I only care for number greater or equal to my lower bound (which I also make generic) and second is for the O(1) membership testing.

The last line is a simple linear search for the missing element with the property that default to start if the maximum element is lower to start or is the maximum+1 if the array have no missing element between start and its maximum.

here are some tests borrow from the others answers...

assert 1 == firstMissingSince(A) 
assert 2 == firstMissingSince([1,4,3,6,5])
assert 2 == firstMissingSince([1,44,3,66,55]) 
assert 6 == firstMissingSince([1,2,3,4,5]) 
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 4 == firstMissingSince([18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7])
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 3 == firstMissingSince([7, -7, 19, 6, -3, -6, 1, -8, -1, 19, -8, 2, 4, 19, 5, 6, 6, 18, 8, 17])

the answer of Rockybilly make me realize that I don't need to know the maximum value at all, so here is version 2

from itertools import count

def firstMissingSince(sequence, start=1):
    uniques = set(sequence) # { x for x in sequence if x>=start } 
    return next( x for x in count(start) if x not in uniques )

Solution 2:

Code

def first_missing_positive(nums):
    bit = 0
    for n in nums:
        if n > 0:
            bit |= 1 << (n - 1)
    flag = 0
    while bit != 0:
        if (bit & 1 == 0):
            break
        flag += 1
        bit >>= 1
    return flag + 1

Explanation

Usually, for constant space requirements, bitwise solutions are great.

The trick here is to pass through all the integers and store their binary representations in a single variable. Say "bit". e.g. when nums = [1, 2, 3] i.e. nums_bitwise = [1, 10, 11], bit = "11". 11 here represents the sequence [1, 10, 11] in a condensed form.

Now, Let's assume 2 was missing from nums. Then we have nums = [1, 3] i.e. nums_bitwise = [1, 11], bit = "101". We can now loop through all bits of the "bit" variable to find out the first missing positive integer 2 i.e. "0" in "101"

Note that, the value of bit will be 5 and 7 for nums=[1, 3] and nums=[1, 2, 3] respectively. You will need to do "{0:b}".format(bit) for its binary representaion.

The key line

bit |= 1 << (n - 1)

Stores all integers in nums by moving left, bit-by-bit, and ORing bits of integers with the default 0s of the bit variable.

Next, we do

if (bit & 1 == 0):
    break

To find the first zero in the condensed bit variable. First zero represents the first missing integer. Shift right by one bit bit >>= 1 and find if that bit is missing. If not, keep ANDing the last bit of bit variable with 1 until the result is 0.

Analysis

Since we look at every integer in nums only once, the time complexity is O(n). Assuming all the integers can be condensed in a single variable, the space complexity is O(1) i.e. constant space.