How to compute the Pareto Frontier, intuitively speaking?
I'm working on a multi-objective optimization problem and we have 'alternatives' that are quantified on two dimensions - value and cost.
Now the question is 'how does one compute a pareto frontier'? I mean I know you can apply algorithms that will do it for you, but I want to know the basic underlying algorithm/mathematical-steps that would be employed to come up with a pareto frontier - I want to be able to do it with pen and paper - even if the algorithm is NOT efficient.
I looked all over and it seems I only stumble upon various algorithms that can be employed but not the actual intuition behind how does one compute a Pareto frontier in the first place!
The basic definition of the Pareto frontier is that it consists of exactly those alternatives that are not dominated by any other alternative. We say that an alternative $A$ dominates $B$ if $A$ outscores $B$ regardless of the tradeoff between value and cost — that is, if $A$ is both better and cheaper than $B$.
Obviously, both the best and the cheapest alternative always belong to the Pareto frontier, and in fact, they are its endpoints. A simple algorithm to find the other alternatives (if any) on the Pareto frontier is to first sort the alternatives according to one of the objectives — say, cost. One then starts with the cheapest alternative (which, as noted, always belongs in the Pareto frontier) and skips successive alternatives in order of increasing cost until one finds one with a higher value. This alternative is then added to the frontier and the search is restarted from it.
A step-by-step description of the algorithm, assuming that $A_1, \dotsc, A_n$ are the alternatives in increasing order of cost, goes like this:
Algorithm A:
- Let $i := 1$.
- Add $A_i$ to the Pareto frontier.
- Find smallest $j>i$ such that $\operatorname{value}(A_j) > \operatorname{value}(A_i)$.
- If no such $j$ exists, stop. Otherwise let $i := j$ and repeat from step 2.
Addendum
In the comments below, Nupul asked about computing the Pareto frontier for combinations of alternatives.
There are two natural and interesting cases: one where the combinations are sets of alternatives (i.e. each alternative may appear at most once in a combination), and one where they are multisets (which can include the same alternative more than once). I'll address both cases below, but let me first give an alternative algorithm for computing the Pareto frontier for single alternatives:
Algorithm B:
- Let $A_1, \dotsc, A_n$ be the alternatives sorted in order of increasing cost/value ratio. Let $i := 1$.
- Add $A_i$ to the Pareto frontier $P$.
- Find smallest $j>i$ such that $A_j$ is not dominated by any alternative already in $P$.
- If no such $j$ exists, stop. Otherwise let $i := j$ and repeat from step 2.
In step 3, to check whether $A_j$ is dominated by any alternative in $P$, it will suffice to find the most expensive alternative $B \in P$ which is still cheaper than $A_j$. (Of course, by symmetry, we could also choose $B$ as the least valuable alternative in $P$ that has higher value than $A_j$.) If $B$ does not dominate $A_j$, neither can any other alternative in $P$.
Algorithm B is somewhat more complicated than algorithm A above (in particular, algorithm A runs in $O(n)$ time if the alternatives are already sorted, whereas algorithm B needs $O(n \log n)$ time in any case), but it turns out to generalize better.
Now, let's consider sets of zero or more alternatives. Obviously, we could just enumerate all such sets and then apply algorithm A or B, but this would be very inefficient: for $n$ alternatives, the number of sets is $2^n$. Instead, we can adapt algorithm $B$ to construct the combinations on the fly:
Algorithm C:
- Let $A_1, \dotsc, A_n$ be the alternatives sorted in order of increasing cost/value ratio. Let $i := 1$. Let $P := \lbrace\emptyset\rbrace$, where $\emptyset$ denotes the combination containing no alternatives.
- For each combination $C \in P$, let $C^* := C \cup \lbrace A_i \rbrace$. If $C^*$ is not dominated by any combination already in $P$, add $C^*$ to $P$.
- If $i = n$, stop. Otherwise increment $i$ by one and repeat from step 2.
Again, as in algorithm B, we don't need to compare $C^*$ to every combination in $P$; it's enough to check whether $C^*$ is dominated by the most expensive combination in $P$ that is cheaper than $C^*$.
What about multiset combinations? In this case, obviously, the Pareto frontier $P$ contains infinitely many combinations: in particular, for any combination $C \in P$, the combination $C + A_1$, where $A_1$ is the alternative with the lowest cost/value ratio, also belongs in $P$.
However, the number of times any other alternative can appear in a non-dominated combination must be bounded. (The proof is a bit tedious, but follows from simple geometrical considerations.) Therefore we only need to consider the finite set $P_0$ of those combinations on the Pareto frontier which do not include the alternative $A_1$; the remaining combinations on the frontier are obtained by adding some number of $A_1$s to those combinations.
For multiset combinations, we also have the following useful lemma:
Lemma: Any combination that contains a dominated sub-combination must itself be dominated.
In particular, this means that combinations in $P$ can only include alternatives that themselves belong in $P$. Thus, as a first step, we can use algorithm A (or B) to compute the Pareto frontier for single alternatives and discard any alternatives that are not part of it.
For a complete algorithm to compute $P_0$, the following definition turns out to be useful:
Dfn: A combination $B$ is said to $A$-dominate $C$ if the combination $B + nA$ dominates $C$ for some non-negative integer $n \in \mathbb N$. Equivalently, $B$ $A$-dominates $C$ iff $\operatorname{cost}(C) > \operatorname{cost}(B) + n\,\operatorname{cost}(A)$, where $n = \max \left(0, \left\lfloor \frac{\operatorname{value}(C)-\operatorname{value}(B)}{\operatorname{value}(A)} \right\rfloor \right)$.
Using this definition, we can apply the following algorithm:
Algorithm D:
- Let $A_1, \dotsc, A_n$ be the (non-dominated) alternatives sorted in order of increasing cost/value ratio. Let $i := 2$. Let $P_0 := \lbrace\emptyset\rbrace$.
- For each combination $C \in P_0$, let $C^* := C + A_i$. If $C^*$ is not $A_1$-dominated by any combination already in $P_0$, add $C^*$ to $P_0$.
- Repeat from step 2 until no more combinations are added to $P_0$.
- If $i = n$, stop. Otherwise increment $i$ by one and repeat from step 2.
I think this algorithm should be correct, but to be honest, I'm not 100% sure that there aren't any mistakes or gaps in my proof sketch. So please test it thoroughly before using it for anything that matters. :-)
I also think it should be possible to implement step 2 efficiently in a manner similar to algorithms B and C, but the situation is complicated by the fact that the rejection condition is $A_1$-domination rather than simple domination. Of course, if $B$ dominates $C$, then it trivially $A$-dominates $C$ for any $A$, so one can at least first do a quick check for simple dominance.
Another way to optimize step 2 is to note that if $C + A_i$ is $A_1$-dominated by any combination in $P_0$, then so will be $D + A_i$ for any combination $D$ that includes $C$ as a sub-combination. In particular, we can skip to step 4 as soon as we find that $\emptyset + A_i = A_i$ itself is $A_1$-dominated by some combination already in $P_0$.