How to remove all duplicate items from a list [duplicate]
How would I use python to check a list and delete all duplicates? I don't want to have to specify what the duplicate item is - I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.
For example, in my code below, the list lseparatedOrbList has 12 items - one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items - one of each, and in the same order they appeared before. I tried this:
for i in lseparatedOrbList:
for j in lseparatedOrblist:
if lseparatedOrbList[i] == lseparatedOrbList[j]:
lseparatedOrbList.remove(lseparatedOrbList[j])
But I get the error:
Traceback (most recent call last):
File "qchemOutputSearch.py", line 123, in <module>
for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined
I'm guessing because it's because I'm trying to loop through lseparatedOrbList while I loop through it, but I can't think of another way to do it.
Use set()
:
woduplicates = set(lseparatedOrblist)
Returns a set without duplicates. If you, for some reason, need a list back:
woduplicates = list(set(lseperatedOrblist))
This will, however, have a different order than your original list.
Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.
for i in mylist:
if i not in newlist:
newlist.append(i)
This should be faster and will preserve the original order:
seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]
If you don't care about order, you can just:
new_list = list(set(my_list))
You can do this like that:
x = list(set(x))
Example: if you do something like that:
x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x
you will see the following result:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]
There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).
The modern way to do it that maintains the order is:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))
as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way - see the linked answer for details. However the keys must be hashable (as is the case in your list I think)
As of python 3.7 ordered dicts are a language feature so the above call becomes
>>> list(dict.fromkeys(lseparatedOrbList))
Performance:
"""Dedup list."""
import sys
import timeit
repeat = 3
numbers = 1000
setup = """"""
def timer(statement, msg='', _setup=None):
print(msg, min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers)))
print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "\nFew duplicates"), (15000, "\nMany duplicates")):
print(m)
setup = s % siz
timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
timer('list(dict.fromkeys(li))', "dict:")
timer('list(set(li))', 'Not order preserving: list(set(li)):')
gives:
3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]
Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786
Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018
So dict seems consistently faster although approaching list comprehension with set.add for many duplicates - not sure if further varying the numbers would give different results. list(set)
is of course faster but does not preserve original list order, a requirement here