Subset Sum algorithm
I am working on this problem:
The Subset Sum problem takes as input a set
X = {x1, x2 ,…, xn}
ofn
integers and another integerK
. The problem is to check if there exists a subsetX'
ofX
whose elements sum toK
and finds the subset if there's any. For example, ifX = {5, 3, 11, 8, 2}
andK = 16
then the answer isYES
since the subsetX' = {5, 11}
has a sum of16
. Implement an algorithm for Subset Sum whose run time is at leastO(nK)
.
Notice complexity O(nK)
. I think dynamic programming may help.
I have found an exponential time algorithm, but it doesn't help.
Can someone help me solve this problem?
Subset Sum is the first NP-complete problem I learned at Macalester. This question is viewed 36000+ times yet I don't see a sufficient answer that explains the algorithm in detail with logic. So I thought I make an attempt to do so.
Assumption:
For the sake of simplicity first I made the assumption that the input set X
contains only positive integers and k
is positive. However, we can tweak the algorithm to handle negative integers and the case if k
is negative.
Logic:
The key to this algorithm or really any DP problem is to break down the problem and start simply from a base case. then we can build on the base case using some knowledge that we know:
- we know that if the set
X
is empty then there is no way we can sum to any value ofk
. - If a set
X
containsk
then it has a subset sum tok
. - we know that if a subset of set
x1
who is a subset ofX
sum tok1
thenX
will have a subset that sum tok1
namelyx1
. - we have a set
X = {x1, x1, x3, ......., xn, xn+1}
. We know it has a subset sum tok1
ifx1 = {x1, x1, x3, ......., xn}
has a subset sum tok - k1
.
Example to illustrate 1,2,3,4:
- it is easy. if you have an empty set {}. you can't have a subset thus you can't have any subset sum.
A set
X = {4}
has a subset sum to 4 because 4 it self is part of the setsay you have a set
x1 = {1,3,5}
who is a subset of setX = {1,3,5,2,8}
. ifx1
has a subset sum tok1 = 8
then that meansX
also has a subset sum to 8 becausex1
is a subset ofX
- say you have a set
X = {1,3,5,2,19}
and we want to know does it have a subset sum to 20. It does and one way to can know if that isx1 = {1,3,5,2}
can sum to (20 - 19) = 1. Since x1 has a subset sum to 1 then when we add 19 to the set x1 we can take that new number 1 + 19 = 20 to create our desired sum 20.
Dynamically build a matrix
Cool! now let us utilize the above four logics and start building from the base case. We are going to build a matrix m
. We define:
matrix
m
hasi+1
rows andk + 1
columns.Each cell of the matrix has value
true
orfalse
.m[i][s] returns true or false to indicate the answer to this question: "using the first
i
items in the array can we find a subset sum tos
? "m[i][s]
returnstrue
for yes andfalse
for no
(note the Wikipedia answer or most of the people build a function m(i,s) but I thought the matrix is an easy way to understand dynamic programming. It works well when we only have positive numbers in the set or array. However the function route is better because you don't have to deal with index out of range, match index of the array and sum to the matrix.....)
Let us build the matrix using an example:
X = {1,3,5,2,8}
k = 9
We are going to build the matrix row by row. We ultimately want to know the cell m[n][k] contain true
or false
.
First Row:
Logic 1. told us that the first row of the matrix should all be false
.
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ _ _ _ _ _
0| F F F F F F F F F F
1|
2|
3|
4|
5|
Second Row and above: Then for the second row or above, we can use logic 2,3,4 to help us populate the matrix.
- logic 2 tells us that
m[i][s] = (X[i-1] == s)
rememebr m[i] is referring to the ith item in X which is X[i-1] - logic 3 tells us that
m[i][s] = (m[i-1][s])
this is looking at the cell direclty above. - logic 4 tells us that
m[i][s] = (m[i-1][s - X[i-1]])
this is looking at the row above and left of X[i-1] cells.
If any of these is true
then m[i][s]
is true
otherwise false
. so we can rewrite 2,3,4 into m[i][s] = (m[i-1][s] || a[i-1] == s || m[i-1][s - a[i-1]])
Use these above logics to populate the matrix m
. In our example, it looks like this.
0 1 2 3 4 5 6 7 8 9
_ _ _ _ _ _ _ _ _ _
0| F F F F F F F F F F
1| F T F F F F F F F F
2| F T F T T F F F F F
3| F T F T T T T F T T
4| F T T T T T T T T T
5| F T T T T T T T T T
Now use the matrix to answer your question:
look at m[5][9]
which is the original question. using the first 5 items (which is all the items) can we find a subset sum to 9 (k)? and the answer is indicated by that cell which is true
Here is the Code:
import java.util.*;
public class SubSetSum {
public static boolean subSetSum(int[] a, int k){
if(a == null){
return false;
}
//n items in the list
int n = a.length;
//create matrix m
boolean[][] m = new boolean[n + 1][k + 1]; //n + 1 to include 0, k + 1 to include 0
//set first row of matrix to false. This also prevent array index out of bounds: -1
for(int s = 0; s <= k; s++){
m[0][s] = false;
}
//populate matrix m
for(int i = 1; i <= n; i++){
for(int s = 0; s <= k; s++){
if(s - a[i-1] >= 0){ //when it goes left we don't want it to go out of bounds. (logic 4)
m[i][s] = (m[i-1][s] || a[i-1] == s || m[i-1][s - a[i-1]]);
} else {
m[i][s] = (m[i-1][s] || a[i-1] == s);
}
}
}
//print matrix
print(m);
return m[n][k];
}
private static void print(boolean[][] m){
for(int i = 0; i < m.length; i++){
for(int j = 0; j < m[i].length; j++){
if(m[i][j]){
System.out.print("T");
} else {
System.out.print("F");
}
}
System.out.print("\n");
}
}
public static void main(String[] args){
int[] array = {1,3,5,2,8};
int k = 9;
System.out.println(subSetSum(array,k));
}
}
To build the matrix m
takes O((n+1)(k+1)) which is O(nk). it seems like it should be polynomial but it is not! It's actually pseudo polynomial. Read about it here
Again this only works if the input only contains positive numbers. You can easily tweak it to work with negative numbers tho. The matrix would still have n+1 rows but B - A + 1
columns. Where B
is the upperbound and A
is the lowerbound ( +1 to include zero).The matrix would still be You would have to offset s
with the lowerbound.
It is pretty hard to explain DP problem over text from start to end. But I hope this helps those out there that try to understand this problem.
Note that in the examples above the rows of the DP table are sorted. That does not have to be the case.
Here is a DP table for the case of the question, i.e. given a set of {5, 3, 11, 8, 2}. For brevity, I have omitted the false values.
┌─────────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ (index) │ 0 │ 2 │ 3 │ 5 │ 7 │ 8 │ 10 │ 11 │ 13 │ 14 │ 15 │ 16 │
├─────────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┤
│ 0 │ true │ │ │ │ │ │ │ │ │ │ │ │
│ 5 │ true │ │ │ true │ │ │ │ │ │ │ │ │
│ 3 │ true │ │ true │ true │ │ true │ │ │ │ │ │ │
│ 11 │ true │ │ true │ true │ │ true │ │ true │ │ true │ │ true │
│ 8 │ true │ │ true │ true │ │ true │ │ true │ true │ true │ │ true │
│ 2 │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │
└─────────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘
Below is an implementation in JavaScript which will output the target set {5, 11}:
var subSetSum = function(input, sum) {
let y = input.length;
let x = sum;
if(input.length === 0) return 0;
let d = [];
//fill the rows
for (let i = 0; i <= y; i++) {
d[i] = [];
d[i][0] = true;
}
for (let j = 1; j <= y; j++) { //j row
for (let i = 1; i <= x; i++) { //i column
let num = input[j-1];
if(num === i) {
d[j][i] = true;
} else if(d[j-1][i]) {
d[j][i] = true;
} else if (d[j-1][i-num]) {
d[j][i] = true;
}
}
}
//console.table(d); //uncomment to see the table
if(!d[y][x]) return null;
let searchedSet = [];
for(let j=input.length, i=sum; j>0 && i != 0; j--) {
if(input[j-1] !== i) {
while(d[j-1][i]) { // go up
j--;
}
}
searchedSet.push(input[j-1]);
i = i-input[j-1];
}
return searchedSet;
};
console.log('searched set:'+ JSON.stringify(subSetSum([5, 3, 11, 8, 2], 16)));
Since it looks like all your numbers are positive, you can solve this using dynamic programming:
Start will a boolean array possible
of size K+1 with the first value true, the rest false. The ith value will represent whether a subset sum of i is possible to achieve. For each number n in your set, loop through the possible
array and if the ith value is true, then set the i+nth value to true as well.
At the end, if the kth value in possible
is true, then you can form a subset sum of k. Problem solved in O(NK) time.
Wikipedia's page on the subset sum problem has a detailed explanation of this algorithm applied to sets of integers not guaranteed to be positive.
I'd suggest to read Wiki's algorithm. The algorithm exists there, see Pseudo-polynomial time dynamic programming solution for the O(P*n)
solution, The solution is not polynomial time, is polynomial in (p,n) but it's not polynomial in n+log P (size of input) and because P
can be very large like 2^n, the solution P*n = (2^n)*n is not a polynomial time solution in general, but when p is bounded by some polynomial function of n is polynomial time algorithm.
This problem is NPC, but there is a Pseudo polynomial time
algorithm for it, and belongs to weakly NP-Complete
problems, Also there are Strongly NP-Complete
problems, which means, you can't find any pseudo polynomial time
algorithm for them unless P=NP, and this problem is not in this range of problems, So somehow is easy.
I said this as simple as possible,but it's not an exact definition of a Strongly NP-Complete or Weakly NP-Complete problems.
For detail see Garey and Johnson chapter 4.