The intersection of all combinations of n sets
Here is a solution inspired by MapReduce: Simplified Data Processing on Large Clusters, which can therefore be written in a distributed way if you want.
Map all of your elements in sets to pairs [element, set]
. Group this list by element. You can just sort and pull out elements. Or you can create a hash of arrays whose keys are the elements and values are the list of sets that element is found in. Then for each element, emit a list of [combination of sets, element]
. Group that by combination. And now for each non-empty combination, you can just read off the elements in it.
If you wish to distribute the computation with a real map-reduce, the first map would map to a key of element, and value of set. The first reduce would just store by element the list of sets it is in. The second map would emit for each element one key for each combination of sets it is in, with the element as the value. And the second reduce would store your final answer.
It may help to work your example in detail. You start with:
Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4
You map that to the list:
[1, Set 1], [10, Set 1], [6, Set 1], [11, Set 1], [14, Set 1], [3, Set 1],
[3, Set 2], [7, Set 2], [11, Set 2], [9, Set 2], [5, Set 2],
[11, Set 3], [6, Set 3], [9, Set 3], [1, Set 3], [4, Set 3],
Now group by element (I did it by hand by sorting) to get:
1: Set 1, Set 3
3: Set 1, Set 2
4: Set 3
5: Set 2
6: Set 1, Set 3
7: Set 2
9: Set 2, Set 3
10: Set 1
11: Set 1, Set 2, Set 3
14: Set 1
And now we do the second mapping (skipping the elements that are only in one set) to get:
[(Set 1, Set 3), 1],
[(Set 1, Set 2), 3],
[(Set 1, Set 3), 6],
[(Set 2, Set 3), 9],
[(Set 1, Set 2), 11],
[(Set 1, Set 3), 11],
[(Set 2, Set 3), 11],
[(Set 1, Set 2, Set 3), 11]
Group that by combination of sets and we get:
(Set 1, Set 2): 3, 11
(Set 1, Set 3): 1, 6, 11
(Set 2, Set 3): 9, 11
(Set 1, Set 2, Set 3): 11
Using your example, just note that
1 n 2 n 3
is also
(1 n 2) n 3
So if you cache your (1 n 2) solution, you can reuse it, so that computing 1 n 2 n 3 will only require one additional intersection calculation. Generally, if there are, as in your example, four possible set combinations, then only four intersections will ultimately have to be done if you save and reuse prior results.