Algorithm to find which number in a list sum up to a certain number

I have a list of numbers. I also have a certain sum. The sum is made from a few numbers from my list (I may/may not know how many numbers it's made from). Is there a fast algorithm to get a list of possible numbers? Written in Python would be great, but pseudo-code's good too. (I can't yet read anything other than Python :P )

Example

list = [1,2,3,10]
sum = 12
result = [2,10]

NOTE: I do know of Algorithm to find which numbers from a list of size n sum to another number (but I cannot read C# and I'm unable to check if it works for my needs. I'm on Linux and I tried using Mono but I get errors and I can't figure out how to work C# :(
AND I do know of algorithm to sum up a list of numbers for all combinations (but it seems to be fairly inefficient. I don't need all combinations.)


Solution 1:

This problem reduces to the 0-1 Knapsack Problem, where you are trying to find a set with an exact sum. The solution depends on the constraints, in the general case this problem is NP-Complete.

However, if the maximum search sum (let's call it S) is not too high, then you can solve the problem using dynamic programming. I will explain it using a recursive function and memoization, which is easier to understand than a bottom-up approach.

Let's code a function f(v, i, S), such that it returns the number of subsets in v[i:] that sums exactly to S. To solve it recursively, first we have to analyze the base (i.e.: v[i:] is empty):

  • S == 0: The only subset of [] has sum 0, so it is a valid subset. Because of this, the function should return 1.

  • S != 0: As the only subset of [] has sum 0, there is not a valid subset. Because of this, the function should return 0.

Then, let's analyze the recursive case (i.e.: v[i:] is not empty). There are two choices: include the number v[i] in the current subset, or not include it. If we include v[i], then we are looking subsets that have sum S - v[i], otherwise, we are still looking for subsets with sum S. The function f might be implemented in the following way:

def f(v, i, S):
  if i >= len(v): return 1 if S == 0 else 0
  count = f(v, i + 1, S)
  count += f(v, i + 1, S - v[i])
  return count

v = [1, 2, 3, 10]
sum = 12
print(f(v, 0, sum))

By checking f(v, 0, S) > 0, you can know if there is a solution to your problem. However, this code is too slow, each recursive call spawns two new calls, which leads to an O(2^n) algorithm. Now, we can apply memoization to make it run in time O(n*S), which is faster if S is not too big:

def f(v, i, S, memo):
  if i >= len(v): return 1 if S == 0 else 0
  if (i, S) not in memo:  # <-- Check if value has not been calculated.
    count = f(v, i + 1, S, memo)
    count += f(v, i + 1, S - v[i], memo)
    memo[(i, S)] = count  # <-- Memoize calculated result.
  return memo[(i, S)]     # <-- Return memoized value.

v = [1, 2, 3, 10]
sum = 12
memo = dict()
print(f(v, 0, sum, memo))

Now, it is possible to code a function g that returns one subset that sums S. To do this, it is enough to add elements only if there is at least one solution including them:

def f(v, i, S, memo):
  # ... same as before ...

def g(v, S, memo):
  subset = []
  for i, x in enumerate(v):
    # Check if there is still a solution if we include v[i]
    if f(v, i + 1, S - x, memo) > 0:
      subset.append(x)
      S -= x
  return subset

v = [1, 2, 3, 10]
sum = 12
memo = dict()
if f(v, 0, sum, memo) == 0: print("There are no valid subsets.")
else: print(g(v, sum, memo))

Disclaimer: This solution says there are two subsets of [10, 10] that sums 10. This is because it assumes that the first ten is different to the second ten. The algorithm can be fixed to assume that both tens are equal (and thus answer one), but that is a bit more complicated.

Solution 2:

I know I'm giving an answer 10 years later since you asked this, but i really needed to know how to do this an the way jbernadas did it was too hard for me, so i googled it for an hour and I found a python library itertools that gets the job done!

I hope this help to future newbie programmers. You just have to import the library and use the .combinations() method, it is that simple, it returns all the subsets in a set with order, I mean:

For the set [1, 2, 3, 4] and a subset with length 3 it will not return [1, 2, 3][1, 3, 2][2, 3, 1] it will return just [1, 2, 3]

As you want ALL the subsets of a set you can iterate it:

import itertools

sequence = [1, 2, 3, 4]
for i in range(len(sequence)):
    for j in itertools.combinations(sequence, i):
        print(j)

The output will be

() (1,) (2,) (3,) (4,) (1, 2) (1, 3) (1, 4) (2, 3) (2, 4) (3, 4) (1, 2, 3) (1, 2, 4) (1, 3, 4) (2, 3, 4)

Hope this help!

Solution 3:

So, the logic is to reverse sort the numbers,and suppose the list of numbers is l and sum to be formed is s.

   for i in b:
            if(a(round(n-i,2),b[b.index(i)+1:])):
                r.append(i)    
                return True
        return False

then, we go through this loop and a number is selected from l in order and let say it is i . there are 2 possible cases either i is the part of sum or not. So, we assume that i is part of solution and then the problem reduces to l being l[l.index(i+1):] and s being s-i so, if our function is a(l,s) then we call a(l[l.index(i+1):] ,s-i). and if i is not a part of s then we have to form s from l[l.index(i+1):] list. So it is similar in both the cases , only change is if i is part of s, then s=s-i and otherwise s=s only.

now to reduce the problem such that in case numbers in l are greater than s we remove them to reduce the complexity until l is empty and in that case the numbers which are selected are not a part of our solution and we return false.

if(len(b)==0):
    return False    
while(b[0]>n):
    b.remove(b[0])
    if(len(b)==0):
        return False    

and in case l has only 1 element left then either it can be part of s then we return true or it is not then we return false and loop will go through other number.

if(b[0]==n):
    r.append(b[0])
    return True
if(len(b)==1):
    return False

note in the loop if have used b..but b is our list only.and i have rounded wherever it is possible, so that we should not get wrong answer due to floating point calculations in python.

r=[]
list_of_numbers=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]
list_of_numbers=sorted(list_of_numbers)
list_of_numbers.reverse()
sum_to_be_formed=401.54
def a(n,b):
    global r
    if(len(b)==0):
        return False    
    while(b[0]>n):
        b.remove(b[0])
        if(len(b)==0):
            return False    
    if(b[0]==n):
        r.append(b[0])
        return True
    if(len(b)==1):
        return False
    for i in b:
        if(a(round(n-i,2),b[b.index(i)+1:])):
            r.append(i)    
            return True
    return False
if(a(sum_to_be_formed,list_of_numbers)):
    print(r)

this solution works fast.more fast than one explained above. However this works for positive numbers only. However also it works good if there is a solution only otherwise it takes to much time to get out of loops.

an example run is like this lets say

    l=[1,6,7,8,10]

and s=22 i.e. s=1+6+7+8
so it goes through like this 

1.) [10, 8, 7, 6, 1] 22
i.e. 10  is selected to be part of 22..so s=22-10=12 and l=l.remove(10)
2.) [8, 7, 6, 1] 12
i.e. 8  is selected to be part of 12..so s=12-8=4 and l=l.remove(8)
3.) [7, 6, 1] 4  
now 7,6 are removed and 1!=4 so it will return false for this execution where 8 is selected.
4.)[6, 1] 5
i.e. 7  is selected to be part of 12..so s=12-7=5 and l=l.remove(7)
now 6 are removed and 1!=5 so it will return false for this execution where 7 is selected.
5.)[1] 6
i.e. 6  is selected to be part of 12..so s=12-6=6 and l=l.remove(6)
now 1!=6 so it will return false for this execution where 6 is selected.
6.)[] 11
i.e. 1 is selected to be part of 12..so s=12-1=1 and l=l.remove(1)
now l is empty so all the cases for which 10 was a part of s are false and so 10 is not a part of s and we now start with 8 and same cases follow.
7.)[7, 6, 1] 14
8.)[6, 1] 7
9.)[1] 1

just to give a comparison which i ran on my computer which is not so good. using

l=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,145.21,123.56,11.90,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]

and

s=2000

my loop ran 1018 times and 31 ms.

and previous code loop ran 3415587 times and took somewhere near 16 seconds.

however in case a solution does not exist my code ran more than few minutes so i stopped it and previous code ran near around 17 ms only and previous code works with negative numbers also.

so i thing some improvements can be done.