Combinatorial optimization - improve performance
I am writing a program to solve a combinatorial optimization problem. I have been working on an algorithm that gets the desired results, but I am having difficulties getting the algorithm to perform well.
The problem domain is the medical equipment supply industry, and the basic requirement is to assign items to packages so that the rental charge is minimized. Equipment can be rented individually, but there are also packages defined containing a combination of equipments, and the rental charge of the packages is less than the total rental charge of the individual equipments. While generating the bill, I need to assign the items to packages wherever possible in such a way that the total rental charge is minimized.
In my model, packages are defined as follows
- Each package can contains 1..N Groups G1, G2, G3,.. GN.
- Each Group must have at least X items (where X >= 0), and at most Y items ( Y > 0, Y >= X)
- Each Group can contain 1..n Item Types IT1, IT2, ... ITn
- Each Item Type must be included at least A times (A >= 0) in the group, and at most B times (B > 0, B >= A)
An example of a packages could be as follows;
Bed and Wheelchair package
Group: Bed (total item min 1, max 1)
Bed 6 feet (min 0, max 1)
Bed 6.5 feet (min 0, max 1)
Group: Wheelchair (total item min 1, max 1)
Manual wheelchair (min 0, max 1)
Motorized wheelchair (min 0, max 1)
Group: Accessories (min 0, max 3)
Air Mattress (min 0, max 1)
Air blower (min 0, max 1)
Pillows (min 0, max 2)
There would be multiple packages available similar to the above. One particular Item Type can appear only once in a package. However, different packages can have the same Item Type.
Assumptions / Requirements:
- Each package has a rental charge associated with it, as does each Item Type.
- When renting a package, only the package price is charged, not the items inside the package. A rental can consist of both packages and individual items.
- When renting out X number of items of different types, the items should be assigned to packages where possible, so that the total rental is minimized.
- There can be multiple ways of forming the packages, and the ones with the lowest rental should be selected.
- There can be multiple combinations that produce the lowest rental. All of them are required in the result. (I may be able to work around this requirement if absolutely needed).
- I need to support rental of up to 25 items. The number of available packages can vary from 10 to 20.
- An additional complexity that I will need to add later is to support the concept of packages contained inside packages.
- I need to run this for multiple rental bills in a batch mode, which makes the performance of the algorithm so critical.
My approach so far:
- For each item, find a list of the packages that it can belong to based on the item type.
- Create a graph where each node is a item-package assignment. Arrange the graph so that the nodes at each leaf level represent the possible package assignments for a single item.
- Do a depth first traversal of the graph, pruning the subtree and backtracking when it is obvious that the subtree cannot result in an optimal solution. I am pruning the subtree when either of the following is false:
- When the remaining items are included, can it satisfy the minimum quantities required for each package in the branch so far ?
- Is the total price of the packages included so far less than the total of the individual price of all the line items ?
The approach is ending up with an expensive calculation. In one test with 24 items and a small number of packages, it traversed 2.5 million nodes and took more than a minute for the results to come back. Ideally, I would like the results to be returned in seconds.
I will very much appreciate any pointers to an efficient algorithm that can be used for this problem.
EDIT: I am adding an example of a bill and package.
Bed and Accessories package
Group: Bed (total item min 1, max 1)
Bed (B) (min 1, max 1)
Group: Accessories (total item min 0, max 2)
Mattress (M) (min 0, max 1)
Pillows (P) (min 1, max 2)
Total quantities in bill: B: 2, M: 3, P: 5
The packages can be formed in different ways:
2 packages, remaining items billed individually
(1B, 1M, 1P), (1B, 2P)
(1B, 2P), (1B, 2P)
(1B), (1B)
etc
1 package, remaining items billed individually
(1B, 1P),
(1B, 2P),
(1B, 1M, 1P),
(1B),
etc
With IP, the function to be minimized can be defined as
aD + bB + cM + dP
where D is the quantity of discount package
B, M and P are the quantities of the items not billed as packages,
and a, b, c, d are the rental costs for each.
Some of the constraints:
B <= 2
M <= 3
P <= 5
D + B = 2
However, for the Accessories group, it is getting complex. I cannot have
D + M = 3,
as D can be formed without any mattress, like the [(1B, 2P), (1B, 2P)] combination. In this combination, D = 2, and M = 3, since all the Ms are being billed individually, and there are 2 discount packages. I was thinking of introducing some additional variables to represent what quantities are picked for each group, but I think that will result in a quadratic constraint.
Here is Lingo code that I believe solves your simplified problem as an integer program. (I tested it, and the answers make sense.) This should be easily translatable to whatever IP solver you use.
Some comments:
- You need to know what the maximum number of packages of each type is in advance. In practice I think that is doable. For instance, in the example the maximum number is 2 since there are only 2 beds in the bill.
- I think the only integer variables you will need to specify are the Pj's as binary. Since all the other variables have 1's as their coefficients in the constraints, the other variables should come out to have integer values in the optimal solution even without specifying that in advance. That should make the code run much faster. If I'm wrong about this it's easy to modify the code to require integer-valued variables.
- If you allow more than one type of package some of the variables will need to be doubly-indexed. For example, you will need $BP(i,j)$ as the number of beds sold in package j of package type i, as well as $P(i,j)$ as the binary variable equal to 1 if package $j$ of type $i$ is used and 0 otherwise.
! D = number of packages
! BI = number of beds sold individually;
! MI = number of mattresses sold individually;
! PI = number of pillows sold individually;
MIN = 12*D + 10*BI + 6*MI + 3*PI;
! P1 and P2 are binary variables.
! Pj = 1 if package j is used and 0 otherwise
! You need a Pj variable for each potential package j.
! The following constraints force D to be 1 if P1=1 and D to be 2 if P2=1;
! In general, you need a constraint of the form j*Pj <= D for each package Pj.
P1 <= D;
2*P2 <= D;
! Must supply all bill items;
BP1 + BP2 + BI = 2;
MP1 + MP2 + MI = 3;
PP1 + PP2 + PI = 5;
! BPj = number of beds sold in package j.
! Similarly for MPj and PPj for mattresses and pillows.
! These constraints implement the maximum and minimum requirements for
! each package and simultaneously force BPj = MPj = PPj = 0 if Pj = 0.
! You need a set of these constraints for each potential package j;
P1 <= BP1; BP1 <= P1;
MP1 <= P1;
P1 <= PP1; PP1 <= 2*P1;
MP1 + PP1 <= 2*P1;
P2 <= BP2; BP2 <= P2;
MP2 <= P2;
P2 <= PP2; PP2 <= 2*P2;
MP2 + PP2 <= 2*P2;
! Require Pj to be binary;
@BIN(P1); @BIN(P2);
A user designed interface would definitely simplify this process. Have the program designed so that the person ordering the supplied enters in exactly what they want. Drop down selections can simplify how many items can be max/min. After the user selects everything, the program inserts the proper items into the package. Each item will have the cost tied to it. The user selects everything so the computer system doesn't have to run through each option to determine if the item should be in the package.
Not sure if I understood your basis for how the program works, but that's what I got out of it. Hope that helps!
If you exhaustively traverse all nodes, your response time will get exponentially worse as itmes and packages increase.
You can try a Genetic algorithm, which will get you to near optimal, but not guaranteed optimal results in a configurable amount of time. The longer you let it run the better it will get.
You can also architect your algortithm to spread across multiple hardware nodes, so you can reduce the time to search the space by searching in parallel.