Eliminating duplicates from a list without a second list/module
This solution seems to meet your criteria:
xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
while True:
for i, x in enumerate(xs):
try:
del xs[xs.index(x, i+1)]
break
except ValueError:
continue
else:
break
print(xs)
@cresht pointed out the previous version created a new list in a slice - that may be the case, but I've fixed that here, and the new solution is even more brief.
By the way, I prefer the look of the solution offered by @cresht, but that loops through the list twice explicitly while having to do another lookup in .pop()
- it'd be worth checking which is faster to pick the winner.
Edit, I tested, mine's a bit faster:
from timeit import timeit
def dedupe(xs):
while True:
for i, x in enumerate(xs):
try:
del xs[xs.index(x, i+1)]
break
except ValueError:
continue
else:
break
def remove_duplicates(arr: list) -> None:
for i, x in enumerate(arr):
for j, y in enumerate(arr):
if x == y and i != j:
arr.pop(j)
def test_dedupe():
xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
dedupe(xs)
def test_remove_duplicates():
xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
remove_duplicates(xs)
print(timeit(test_dedupe))
print(timeit(test_remove_duplicates))
Result:
1.7659466000004613
2.694205100000545
Both solutions meet the criteria, but this one is just a bit faster. (by default, timeit
runs them a million times)
Note that doing something like what @schmulvad proposes is of course the fastest and arguably best solution - but I don't think it meets the criteria of the problem:
def test_simple():
xs = [1, 2, 3, 3, 4, 5, 6, 4, 1]
xs = list(dict.fromkeys(xs))
print(timeit(test_simple))
Result:
0.4907456999972055
Fast, but creates a new copy of the original (not a list, but still a copy).
As of Python 3.7, standard dict
is guaranteed to preserve order. Then you can do something simply like:
list(dict.fromkeys(your_lst))
For lower version of Python, you can do something similar with collections.OrderedDict
, however, this would require you to import a (built-in) module.
If you really cannot use any modules and are below Python 3.7, then I think you have to write your own function. This does not have quadratic runtime contrary to the other answers, but it does create a set however. Not explicitly against your requirements, but it might be a dealbreaker. You can consider it for a much better asymptotic runtime at the cost of some extra memory:
def ordered_remove_duplicates(lst):
seen, idx = set(), 0
while idx < len(lst):
elm = lst[idx]
if elm in seen:
lst.pop(idx)
else:
seen.add(elm)
idx += 1
return lst
def dedup_1(xs):
# i = len(xs) - 1
l = len(xs)
for i in reversed(range(l)):
b = xs.pop(i)
if b not in xs:
xs.insert(i, b)
# i -= 1
if i < 1:
break
return xs
Benchmark:
from timeit import timeit
import random
a = random.choices(range(5), k=20)
print(a)
def dedup_1(xs):
# i = len(xs) - 1
l = len(xs)
for i in reversed(range(l)):
b = xs.pop(i)
if b not in xs:
xs.insert(i, b)
# i -= 1
if i < 1:
break
return xs
def dedup_2(xs):
while True:
for i, x in enumerate(xs):
try:
del xs[xs.index(x, i+1)]
break
except ValueError:
continue
else:
break
return xs
print(dedup_1(a[:]))
print(dedup_2(a[:]))
def test_dedup_1():
dedup_1(a[:])
def test_dedup_2():
dedup_2(a[:])
print(timeit(test_dedup_1))
print(timeit(test_dedup_2))
Output:
[1, 1, 4, 1, 1, 2, 3, 1, 1, 2, 2, 2, 0, 4, 1, 1, 1, 0, 4, 0]
[1, 4, 2, 3, 0]
[1, 4, 2, 3, 0]
2.8496891420218162
11.819849102001172
Remarks:
- This one seems simpler and often faster than @Grismar's. On my benchmark:
- For k=20
- Above implementation: 1.0546425660140812
- @Grismar's implementation: 1.6304690400138497
- For k=200:
- Above implementation: 23.068929050001316
- @Grismar's implementation: 491.9963251199806
- It is not recommended to use a try-catch block for control flow.
- This is a crude implementation that runs in O(n^2). This is in line with what @matthiasfrip originally suggested. It is very likely that some improvements / relaxations will be needed for practical purposes.
This is the solution I described in my comment. Let me know if it does not work or if I messed up the code.
def remove_duplicates(arr: list) -> None:
for i, x in enumerate(arr):
j = i + 1
while j < len(arr):
if x == arr[j]:
arr.pop(j)
else:
j += 1