Merge sets when two elements in common
This is the follow up of compare sets
I have
Set<Set<Node>> NestedSet = new HashSet<Set<Node>>();
[[Node[0], Node[1], Node[2]], [Node[0], Node[2], Node[6]], [Node[3], Node[4], Node[5]] [Node[2], Node[6], Node[7]] ]
I want to merge the sets when there are two elements in common. For example 0,1,2 and 0,2,6 has two elements in common so merging them to form [0,1,2,6].
Again [0,1,2,6] and [2,6,7] has 2 and 6 common. so merging them and getting [0,1,2,6,7].
The final output should be :
[ [Node[0], Node[1], Node[2], Node[6], Node[7]], [Node[3], Node[4], Node[5]] ]
I tried like this :
for (Set<Node> s1 : NestedSet ) {
Optional<Set<Node>> findFirst = result.stream().filter(p -> { HashSet<Node> temp = new HashSet<>(s1);
temp.retainAll(p);
return temp.size() == 2; }).findFirst();
if (findFirst.isPresent()){
findFirst.get().addAll(s1);
}
else {
result.add(s1);
}
}
But the result I got was :
[[Node[0], Node[1], Node[2], Node[6], Node[7]], [Node[3], Node[4], Node[5]], [Node[0], Node[2], Node[6], Node[7]]]
Any idea ? Is there any way to get the desired output?
Solution 1:
Some considerations:
- Each time you apply a merge, you have to restart the procedure and iterate over the modified collection. Because of this, the iteration order of the input set is important, if you want your code to be deterministic you may want to use collections that give guarantees over their iteration order (e.g. use
LinkedHashSet
(notHashSet
) orList
. - Your current code has side effects as it modifies the supplied sets when merging. In general I think it helps to abstain from creating side effects whenever possible.
The following code does what you want:
static <T> List<Set<T>> mergeSets(Collection<? extends Set<T>> unmergedSets) {
final List<Set<T>> mergedSets = new ArrayList<>(unmergedSets);
List<Integer> mergeCandidate = Collections.emptyList();
do {
mergeCandidate = findMergeCandidate(mergedSets);
// apply the merge
if (!mergeCandidate.isEmpty()) {
// gather the sets to merge
final Set<T> mergedSet = Sets.union(
mergedSets.get(mergeCandidate.get(0)),
mergedSets.get(mergeCandidate.get(1)));
// removes both sets using their index, starts with the highest index
mergedSets.remove(mergeCandidate.get(0).intValue());
mergedSets.remove(mergeCandidate.get(1).intValue());
// add the mergedSet
mergedSets.add(mergedSet);
}
} while (!mergeCandidate.isEmpty());
return mergedSets;
}
// O(n^2/2)
static <T> List<Integer> findMergeCandidate(List<Set<T>> sets) {
for (int i = 0; i < sets.size(); i++) {
for (int j = i + 1; j < sets.size(); j++) {
if (Sets.intersection(sets.get(i), sets.get(j)).size() == 2) {
return Arrays.asList(j, i);
}
}
}
return Collections.emptyList();
}
For testing this method I created two helper methods:
static Set<Integer> set(int... ints) {
return new LinkedHashSet<>(Ints.asList(ints));
}
@SafeVarargs
static <T> Set<Set<T>> sets(Set<T>... sets) {
return new LinkedHashSet<>(Arrays.asList(sets));
}
These helper methods allow to write very readable tests, for example (using the numbers from the question):
public static void main(String[] args) {
// prints [[2, 6, 7, 0, 1]]
System.out.println(mergeSets(sets(set(0, 1, 2, 6), set(2, 6, 7))));
// prints [[3, 4, 5], [0, 2, 6, 1, 7]]
System.out.println(
mergeSets(sets(set(0, 1, 2), set(0, 2, 6), set(3, 4, 5), set(2, 6, 7))));
}
Solution 2:
I'm not sure why you are getting that result, but I do see another problem with this code: It is order-dependent. For example, even if the code worked as intended, it would matter whether [Node[0], Node[1], Node[2]]
is compared first to [Node[0], Node[2], Node[6]]
or [Node[2], Node[6], Node[7]]
. But Sets don't have a defined order, so the result is either non-deterministic or implementation-dependent, depending on how you look at it.
If you really want deterministic order-dependent operations here, you should be using List<Set<Node>>
, rather than Set<Set<Node>>
.