How to divide a set into two subsets such that difference between the sum of numbers in two sets is minimal?
Given a set of numbers, divide the numbers into two subsets such that difference between the sum of numbers in two subsets is minimal.
This is the idea that I have, but I am not sure if this is a correct solution:
- Sort the array
- Take the first 2 elements. Consider them as 2 sets (each having 1 element)
- Take the next element from the array.
- Decide in which set should this element go (by computing the sum => it should be minimum)
- Repeat
Is this the correct solution? Can we do better?
The decision version of the problem you are describing is an NP-complete problem and it is called the partition problem. There are a number of approximations which provide, in many cases, optimal or, at least, good enough solutions.
The simple algorithm you described is a way playground kids would pick teams. This greedy algorithm performs remarkably well if the numbers in the set are of similar orders of magnitude.
The article The Easiest Hardest Problem, by American Scientist, gives an excellent analysis of the problem. You should go through and read it!
No, that doesn't work. There is no polynomial time solution (unless P=NP). The best you can do is just look at all different subsets. Have a look at the subset sum problem.
Consider the list [0, 1, 5, 6]
. You will claim {0, 5}
and {1, 6}
, when the best answer is actually {0, 1, 5}
and {6}
.
No, Your algorithm is wrong. Your algo follows a greedy approach. I implemented your approach and it failed over this test case: (You may try here)
A greedy algorithm:
#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;
#define MXN 55
int a[MXN];
int main() {
//code
int t,n,c;
cin>>t;
while(t--){
cin>>n;
rep(i,n) cin>>a[i];
sort(a, a+n);
reverse(a, a+n);
ll sum1 = 0, sum2 = 0;
rep(i,n){
cout<<a[i]<<endl;
if(sum1<=sum2)
sum1 += a[i];
else
sum2 += a[i];
}
cout<<abs(sum1-sum2)<<endl;
}
return 0;
}
Test case:
1
8
16 14 13 13 12 10 9 3
Wrong Ans: 6
16 13 10 9
14 13 12 3
Correct Ans: 0
16 13 13 3
14 12 10 9
The reason greedy algorithm fails is that it does not consider cases when taking a larger element in current larger sum set and later a much smaller in the larger sum set may result much better results. It always try to minimize current difference without exploring or knowing further possibilities, while in a correct solution you might include an element in a larger set and include a much smaller element later to compensate this difference, same as in above test case.
Correct Solution:
To understand the solution, you will need to understand all below problems in order:
- 0/1 Knapsack with Dynamic Programming
- Partition Equal Subset Sum with DP
- Solution
My Code (Same logic as this):
#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;
#define MXN 55
int arr[MXN];
int dp[MXN][MXN*MXN];
int main() {
//code
int t,N,c;
cin>>t;
while(t--){
rep(i,MXN) fill(dp[i], dp[i]+MXN*MXN, 0);
cin>>N;
rep(i,N) cin>>arr[i];
int sum = accumulate(arr, arr+N, 0);
dp[0][0] = 1;
for(int i=1; i<=N; i++)
for(int j=sum; j>=0; j--)
dp[i][j] |= (dp[i-1][j] | (j>=arr[i-1] ? dp[i-1][j-arr[i-1]] : 0));
int res = sum;
for(int i=0; i<=sum/2; i++)
if(dp[N][i]) res = min(res, abs(i - (sum-i)));
cout<<res<<endl;
}
return 0;
}
Combinations over combinations approach:
import itertools as it
def min_diff_sets(data):
"""
Parameters:
- `data`: input list.
Return:
- min diff between sum of numbers in two sets
"""
if len(data) == 1:
return data[0]
s = sum(data)
# `a` is list of all possible combinations of all possible lengths (from 1
# to len(data) )
a = []
for i in range(1, len(data)):
a.extend(list(it.combinations(data, i)))
# `b` is list of all possible pairs (combinations) of all elements from `a`
b = it.combinations(a, 2)
# `c` is going to be final correct list of combinations.
# Let's apply 2 filters:
# 1. leave only pairs where: sum of all elements == sum(data)
# 2. leave only pairs where: flat list from pairs == data
c = filter(lambda x: sum(x[0])+sum(x[1])==s, b)
c = filter(lambda x: sorted([i for sub in x for i in sub])==sorted(data), c)
# `res` = [min_diff_between_sum_of_numbers_in_two_sets,
# ((set_1), (set_2))
# ]
res = sorted([(abs(sum(i[0]) - sum(i[1])), i) for i in c],
key=lambda x: x[0])
return min([i[0] for i in res])
if __name__ == '__main__':
assert min_diff_sets([10, 10]) == 0, "1st example"
assert min_diff_sets([10]) == 10, "2nd example"
assert min_diff_sets([5, 8, 13, 27, 14]) == 3, "3rd example"
assert min_diff_sets([5, 5, 6, 5]) == 1, "4th example"
assert min_diff_sets([12, 30, 30, 32, 42, 49]) == 9, "5th example"
assert min_diff_sets([1, 1, 1, 3]) == 0, "6th example"