The median minimizes the sum of absolute deviations (the $ {\ell}_{1} $ norm)
Suppose we have a set $S$ of real numbers. Show that $$\sum_{s\in S}|s-x| $$ is minimal if $x$ is equal to the median.
This is a sample exam question of one of the exams that I need to take and I don't know how to proceed.
Introduction: The solution below is essentially the same as the solution given by Brian M. Scott, but it will take a lot longer. You are expected to assume that $S$ is a finite set. with say $k$ elements. Line them up in order, as $s_1<s_2<\cdots <s_k$.
The situation is a little different when $k$ is odd than when $k$ is even. In particular, if $k$ is even there are (depending on the exact definition of median) many medians. We tell the story first for $k$ odd.
Recall that $|x-s_i|$ is the distance between $x$ and $s_i$, so we are trying to minimize the sum of the distances. For example, we have $k$ people who live at various points on the $x$-axis. We want to find the point(s) $x$ such that the sum of the travel distances of the $k$ people to $x$ is a minimum.
The story: Imagine that the $s_i$ are points on the $x$-axis. For clarity, take $k=7$. Start from well to the left of all the $s_i$, and take a tiny step, say of length $\epsilon$, to the right. Then you have gotten $\epsilon$ closer to every one of the $s_i$, so the sum of the distances has decreased by $7\epsilon$.
Keep taking tiny steps to the right, each time getting a decrease of $7\epsilon$. This continues until you hit $s_1$. If you now take a tiny step to the right, then your distance from $s_1$ increases by $\epsilon$, and your distance from each of the remaining $s_i$ decreases by $\epsilon$. What has happened to the sum of the distances? There is a decrease of $6\epsilon$, and an increase of $\epsilon$, for a net decrease of $5\epsilon$ in the sum.
This continues until you hit $s_2$. Now, when you take a tiny step to the right, your distance from each of $s_1$ and $s_2$ increases by $\epsilon$, and your distance from each of the five others decreases by $\epsilon$, for a
net decrease of $3\epsilon$.
This continues until you hit $s_3$. The next tiny step gives an increase of $3\epsilon$, and a decrease of $4\epsilon$, for a net decrease of $\epsilon$.
This continues until you hit $s_4$. The next little step brings a total increase of $4\epsilon$, and a total decrease of $3\epsilon$, for an increase of $\epsilon$. Things get even worse when you travel further to the right. So the minimum sum of distances is reached at $s_4$, the median.
The situation is quite similar if $k$ is even, say $k=6$. As you travel to the right, there is a net decrease at every step, until you hit $s_3$. When you are between $s_3$ and $s_4$, a tiny step of $\epsilon$ increases your distance from each of $s_1$, $s_2$, and $s_3$ by $\epsilon$. But it decreases your distance from each of the three others, for no net gain. Thus any $x$ in the interval from $s_3$ to $s_4$, including the endpoints, minimizes the sum of the distances. In the even case, I prefer to say that any point between the two "middle" points is a median. So the conclusion is that the points that minimize the sum are the medians. But some people prefer to define the median in the even case to be the average of the two "middle" points. Then the median does minimize the sum of the distances, but some other points also do.
We're basically after: $$ \arg \min_{x} \sum_{i = 1}^{N} \left| {s}_{i} - x \right| $$
One should notice that $ \frac{\mathrm{d} \left | x \right | }{\mathrm{d} x} = \operatorname{sign} \left( x \right) $ (Being more rigorous would say it is a Sub Gradient of the non smooth $ {L}_{1} $ Norm function).
Hence, deriving the sum above yields $ \sum_{i = 1}^{N} \operatorname{sign} \left( {s}_{i} - x \right) $.
This equals to zero only when the number of positive items equals the number of negative which happens when $ x = \operatorname{median} \left\{ {s}_{1}, {s}_{2}, \cdots, {s}_{N} \right\} $.
Remarks
- One should notice that the
median
of a discrete group is not uniquely defined. - The median is not necessarily an item within the group.
- Not every set can bring the Sub Gradient to vanish. Yet employing the Sub Gradient Method is guaranteed to converge to median.
- It is not the optimal way to calculate the Median. It is given to give intuition about what's the median.