Eliminating duplicates from a list without a second list/module

This solution seems to meet your criteria:

xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]


while True:
    for i, x in enumerate(xs):
        try:
            del xs[xs.index(x, i+1)]
            break
        except ValueError:
            continue
    else:
        break


print(xs)

@cresht pointed out the previous version created a new list in a slice - that may be the case, but I've fixed that here, and the new solution is even more brief.

By the way, I prefer the look of the solution offered by @cresht, but that loops through the list twice explicitly while having to do another lookup in .pop() - it'd be worth checking which is faster to pick the winner.

Edit, I tested, mine's a bit faster:

from timeit import timeit


def dedupe(xs):
    while True:
        for i, x in enumerate(xs):
            try:
                del xs[xs.index(x, i+1)]
                break
            except ValueError:
                continue
        else:
            break


def remove_duplicates(arr: list) -> None:
    for i, x in enumerate(arr):
        for j, y in enumerate(arr):
            if x == y and i != j:
                arr.pop(j)


def test_dedupe():
    xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
    dedupe(xs)


def test_remove_duplicates():
    xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
    remove_duplicates(xs)


print(timeit(test_dedupe))
print(timeit(test_remove_duplicates))

Result:

1.7659466000004613
2.694205100000545

Both solutions meet the criteria, but this one is just a bit faster. (by default, timeit runs them a million times)

Note that doing something like what @schmulvad proposes is of course the fastest and arguably best solution - but I don't think it meets the criteria of the problem:

def test_simple():
    xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
    xs = list(dict.fromkeys(xs))


print(timeit(test_simple))

Result:

0.4907456999972055

Fast, but creates a new copy of the original (not a list, but still a copy).

As of Python 3.7, standard dict is guaranteed to preserve order. Then you can do something simply like:

list(dict.fromkeys(your_lst))

For lower version of Python, you can do something similar with collections.OrderedDict, however, this would require you to import a (built-in) module.

If you really cannot use any modules and are below Python 3.7, then I think you have to write your own function. This does not have quadratic runtime contrary to the other answers, but it does create a set however. Not explicitly against your requirements, but it might be a dealbreaker. You can consider it for a much better asymptotic runtime at the cost of some extra memory:

def ordered_remove_duplicates(lst):
    seen, idx = set(), 0
    while idx < len(lst):
        elm = lst[idx]
        if elm in seen:
            lst.pop(idx)
        else:
            seen.add(elm)
            idx += 1

    return lst

def dedup_1(xs):
    # i = len(xs) - 1
    l = len(xs)
    for i in reversed(range(l)):
        b = xs.pop(i)
        if b not in xs:
            xs.insert(i, b)
        # i -= 1
        if i < 1:
            break
    return xs

Benchmark:

from timeit import timeit
import random

a = random.choices(range(5), k=20)
print(a)

def dedup_1(xs):
    # i = len(xs) - 1
    l = len(xs)
    for i in reversed(range(l)):
        b = xs.pop(i)
        if b not in xs:
            xs.insert(i, b)
        # i -= 1
        if i < 1:
            break
    return xs

def dedup_2(xs):
    while True:
        for i, x in enumerate(xs):
            try:
                del xs[xs.index(x, i+1)]
                break
            except ValueError:
                continue
        else:
            break
    return xs


print(dedup_1(a[:]))
print(dedup_2(a[:]))

def test_dedup_1():
    dedup_1(a[:])


def test_dedup_2():
    dedup_2(a[:])


print(timeit(test_dedup_1))
print(timeit(test_dedup_2))

Output:

[1, 1, 4, 1, 1, 2, 3, 1, 1, 2, 2, 2, 0, 4, 1, 1, 1, 0, 4, 0]
[1, 4, 2, 3, 0]
[1, 4, 2, 3, 0]
2.8496891420218162
11.819849102001172

Remarks:

This one seems simpler and often faster than @Grismar's. On my benchmark:

For k=20
- Above implementation: 1.0546425660140812
- @Grismar's implementation: 1.6304690400138497
For k=200:
- Above implementation: 23.068929050001316
- @Grismar's implementation: 491.9963251199806

It is not recommended to use a try-catch block for control flow.
This is a crude implementation that runs in O(n^2). This is in line with what @matthiasfrip originally suggested. It is very likely that some improvements / relaxations will be needed for practical purposes.

This is the solution I described in my comment. Let me know if it does not work or if I messed up the code.

def remove_duplicates(arr: list) -> None:
    for i, x in enumerate(arr):
        j = i + 1
        while j < len(arr):
            if x == arr[j]:
                arr.pop(j)
            else:
                j += 1

Eliminating duplicates from a list without a second list/module

Related

Recent Posts