How to test if one string is a subsequence of another? [duplicate]

Solution 1:

def is_subseq(x, y):
    it = iter(y)
    return all(any(c == ch for c in it) for ch in x)

assert is_subseq('india', 'indonesia')
assert is_subseq('oman', 'romania')
assert is_subseq('mali', 'malawi')
assert not is_subseq('mali', 'banana')
assert not is_subseq('ais', 'indonesia')
assert not is_subseq('ca', 'abc')

Also works for any iterables:

assert is_subseq(['i', 'n', 'd', 'i', 'a'],
                 ['i', 'n', 'd', 'o', 'n', 'e', 's', 'i', 'a'])

UPDATE

Stefan Pochmann suggested this.

def is_subseq(x, y):
    it = iter(y)
    return all(c in it for c in x)

Both versions uses iterators; Iterator yields items that was not yielded in previous iteration.

For example:

>>> it = iter([1,2,3,4])
>>> for x in it:
...     print(x)
...     break
...
1
>>> for x in it:  # `1` is yielded in previous iteration. It's not yielded here.
...     print(x)
...
2
3
4

Solution 2:

Just keep on looking for the next character of your potential subsequence, starting behind the last found one. As soon as one of the characters can't be found in the remainder of the string, it's no subsequence. If all characters can be found this way, it is:

def is_subsequence(needle, haystack):
    current_pos = 0
    for c in needle:
        current_pos = haystack.find(c, current_pos) + 1
        if current_pos == 0:
            return False
    return True

An advantage over the top answer is that not every character needs to be iterated in Python here, since haystack.find(c, current_pos) is looping in C code. So this approach can perform significantly better in the case that the needle is small and the haystack is large:

>>> needle = "needle"
>>> haystack = "haystack" * 1000
>>> %timeit is_subseq(needle, haystack)
296 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit is_subsequence(needle, haystack)
334 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Solution 3:

I did

def is_subsequence(x, y):
    """Test whether x is a subsequence of y"""
    x = list(x)
    for letter in y:
        if x and x[0] == letter:
            x.pop(0)

    return not x