Find non-common elements in lists
I'm trying to write a piece of code that can automatically factor an expression. For example, if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].
From this post: How to find list intersection? I see that the common elements can be found by
set([1,2,3,4]&set([2,3,5]).
Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?
I can go ahead and do a for loop:
lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
for elem in eachList:
if elem not in common:
nonCommon += elem
But this seems redundant and inefficient. Does Python provide any handy function that can do that? Thanks in advance!!
Solution 1:
Use the symmetric difference operator for set
s (aka the XOR operator):
>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])
Solution 2:
You can use Intersection concept to deal with this kind of problems.
b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}
Best thing about using this code is it works pretty fast for large data also. I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.
Solution 3:
Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference()
.
EXAMPLE
list_one = [1,2,3,4]
list_two = [2,3,5]
one_not_two = set(list_one).difference(list_two)
# set([1, 4])
two_not_one = set(list_two).difference(list_one)
# set([5])
This could also be written as:
one_not_two = set(list_one) - set(list_two)
Timing
I ran some timing tests on both and it appears that .difference()
has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.
Other Notes
It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. Most of the previous answers return a single list or set that include all of the items.
There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.
If the OP wants to maintain duplicates, a list comprehension could be used, for example:
one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]
...which is roughly the same solution as posed in the original question, only a little cleaner. This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.