How do you compute numerically the Earth mover's distance (EMD)?
I was trying to compute numerically (write a program) that calculated the EMD for two probability distribution $p_X$ and $q_X$. However, I had a hard time finding an outline of how to exactly compute such a distance.
I was wondering if there existed a closed form equation for EMD or if there existed an outline of an algorithm to compute it.
Just like there is very compact equation for say, the KL-divergence $D(p_X||q_X) = \sum_{x \in X}p_X(x) log\frac{p_X(x)}{q_X(x)}$
Is there one such equation for EMD or an algorithm for EMD such that it can be easily numerically computed?
What I have found so far is the following, that $EMD(p_X,q_X)$ between distributions $p_X$ and $q_X$ is:
$$EMD(P,Q) = \frac{\sum^m_{i=1}\sum^n_{j=1} f_{ij}d_{ij}}{\sum^m_{i=1}\sum^b_{j=1} f_{ij}}$$
However, to find $d_{ij}$ one needs to define some distance measure I guess and to find $f_{ij}$ one needs to solve the transportation linear programming defined in:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm#RUBNER98A
Isn't there a solution to the transportation problem such that there is an easy way of evaluating EMD?
Also, I found a wikipedia section to compute it but, the pseudo-code was to ambiguous for me to understand how to actually compute the EMD. If someone understands that section better and can explain it, it would awesome!
Here is the link
http://en.wikipedia.org/wiki/Earth_mover's_distance
or you can just see the pseudo-code right here:
The Pseudo-code/wikipedia section:
If the domain D is discrete, the EMD can be computed by solving an instance transportation problem, which can be solved by the so-called Hungarian algorithm. In particular, if D is a one-dimensional array of "bins" the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins. For example:
$EMD_0 = 0 \\$
$EMD_{i+1} = ( A_i + EMD_i ) - B_i \\$
$TotalDistance = \sum | EMD_i | \\$
I guess what I don't understand is what $A_i$ and $B_i$ are and when the algorithm even stops running. Its just to unclear to me what its doing and I guess I don't understand EMD well enough to derive it myself (If I could I would!)
As an extension to my question, is it a problem if the sample space for the probability distributions is infinite?
I probably won't accept answer that are not as general as possible, but an answer clarifying what wikipedia article is trying to say, would definitively get an up vote.
Solution 1:
The Wikipedia algorithm is correct, but a little terse. A simple example should clear it up if we stick to actual earth moving. Histogram $A$ is three piles of dirt (earth) in a row: $A_0$, $A_1$, and $A_2$, and histogram $B$ is three piles of dirt $B_0$, $B_1$, and $B_2$ in a row.
Suppose the number of shovelfuls in each dirt pile is as follows.
$A_0 = 3$, $A_1 = 2$, $A_2 = 1$
$B_0 = 1$, $B_1 = 2$, $B_2 = 3$
So to make $A$ look like $B$, we really need to take two shovelfuls of dirt from $A_0$ and put them in $A_2$; i.e., move two shovelfuls two units of distance: that's four units of work. The algorithm has us move the two shovelfuls from $A_0$ to $A_1$, and then two from $A_1$ to $A_2$, using $EMD_{i+1} = (A_i + EMD_i) - B_i$.
$$ EMD_0 = 0 \\ EMD_1 = (3 + 0) - 1 = 2 \\ EMD_2 = (2 + 2) - 2 = 2 \\ EMD_3 = (1 + 2) - 3 = 0 \\ \Rightarrow EMD = 2 + 2 + 0 = 4 $$
Solution 2:
In the discrete algorithm you quote, $A_i$ seems to be the amount "of earth" at position $i$ in the first distribution and $B_i$ the amount "of earth" at position $i$ in the second distribution, with $i$ taking values in the non-negative integers. The algorithm as stated needs just one pass, and so long as the two total amounts "of earth" are the same and the ranges of the distributions are bounded, the algorithm will terminate.
For two probability distributions on the real numbers, where the total amounts "of probability" are each $1$ by definition, suppose their cumulative distribution functions are $F_a(x)$ and $F_b(x)$. Then the Earth Mover Distance becomes $$\int_{x=-\infty}^{\infty} |F_a(x) - F_b(x)|\, dx . $$
If the two distributions have densities $f_a(x)$ and $f_b(x)$ this is equivalent to $$\int_{x=-\infty}^{\infty} \left|\int_{y=-\infty}^x(f_a(y) - f_b(y))\,dy\,\right|\, dx.$$
As Niels Diepeveen says in the comments, this changes with distributions on other metric spaces, but I get the impression you are not asking about that.