The median minimizes the sum of absolute deviations (the ℓ2 norm) [duplicate]
Let $a^{(1)},...,a^{(k)} \in \mathbb{R}^n$.
How can one find the point $x \in \mathbb{R}^n$, which minimizes the sum of distance squares $\sum_{\mathcal{l}=1}^{k} \Vert x-a^{\mathcal{(l)}} \Vert _2^2$
I know that a function $h(t) = \sum_{l=1}^{k}(t-x_i)^2$ has its minimum when $t = \overline{x}$. So for the average $\overline{x}$ of the numbers $x_1,...,x_n$ the sum of squares of the deviations $\overline{x}-x_i$ is minimal.
So to get the center of a set of points
$$ S=\{(x_1,y_1),(x_2,y_2),\dots (x_n,y_n)\}$$ we can get their centroid by $$(\bar x,\bar y) = \left(\frac{1}{n}\sum_{i=0}^n x_i, \frac{1}{n}\sum_{i=0}^n y_i\right).$$
I don't really know if this point actually minimizes the sum of distance squares given above. I'd also like to know if only one such point exists or if there are more.
Solution 1:
I believe your intuition is correct.
Consider the following:
$$ \sum_{l=1}^k || x - a^{(l)}||_2^2 = \sum_{l=1}^k \sum_{m=1}^n (x_m - a^{(l)}_m)^2 $$
Where $a_m^{(l)}$ is the $m$th component of the $l$th vector, and $x_m$ is the $m$th component of $x$.
To find the minimum of this with respect to $x$, we can use standard differentiation procedures.
First, differentiate with respect to one direction (i.e. differentiate with respect to some $x_j$): $$ \begin{align} \frac{\partial}{\partial x_j} \sum_{l=1}^k \sum_{m=1}^n (x_i - a^{(l)}_m)^2 &=\sum_{l=1}^k \sum_{m=1}^n \frac{\partial}{\partial x_j} (x_m - a^{(l)}_m)^2 \\ &=\sum_{l=1}^k 2(x_j-a_j^{(l)}) \end{align} $$ Setting this expression to zero you will obtain: $$ \begin{align} \sum_{l=1}^k 2(\hat{x}_j-a_j^{(l)}) &= 0\\ \hat{x}_j &= \frac{1}{k}\sum_{l=1}^k a_j^{(l)} \end{align} $$
This shows that for any direction $j$, there is a stationary point of the function $\sum_{l=1}^k || x - a^{(l)}||_2^2$ (which we will show is a global minimum) given by the average of the $j$th component of the vectors $\{a\}_{l=1}^k$.
Now, to show that this is a unique global minimum, we will use the fact that a function $f(x)$ is strongly convex (which implies that $f(x)$ has a unique minimum point) if its Hessian $H$ (matrix containing the second derivatives) is positive definite (We say that $H$ is positive definite if $\forall c \in \mathbb{R}^n/\{0\}, c^THc > 0$).
Consider the second derivatives of $\sum_{l=1}^k || x - a^{(l)}||_2^2$:
$$ \begin{align} \frac{\partial^2}{\partial_i\partial x_j} \sum_{l=1}^k \sum_{m=1}^n (x_m - a^{(l)}_m)^2 &=\sum_{l=1}^k \frac{\partial}{\partial x_i}2(\hat{x}_j-a_j^{(l)})\\ &=\begin{cases} 2 & i=j\\ 0 & i\neq j \end{cases} \end{align} $$
This means that the Hessian $H$ will be diagonal, and that there will be strictly positive entries on the diagonal. Thus $H$ is positive definite, implying that $\sum_{l=1}^k || x - a^{(l)}||_2^2$ is strongly convex with respect to $x$.
Hence, we can conclude that the stationary point which we found above is indeed a unique minimum.
Solution 2:
You can use this relation: $$ \sum_{i=1}^n \|x-a_i\|^2=n\|\bar{a}-x\|^2+\sum_{i=1}^n \|\bar{a}-a_i\|^2 $$ where $\bar{a}=\frac{1}{n}\sum_{i=1}^n a_i$
If you want to minimize w.r.t. $x$, just take $x=\bar{a}$ to cancel the $\|\bar{a}-x\|^2$ term. Therefore the answer is: $$ x=\bar{a}=\frac{1}{n}\sum_{i=1}^n a_i $$
To prove the first relation, you must notice that: $$ \sum_i^{n}\|a_i-\bar{a}\|^2=\dots=\left(\sum_i^{n}\|a_i\|^2\right)-n\|\bar{a}\|^2 $$ then simply develop $$ \sum_{i=1}^n \|x-a_i\|^2=\sum_{i=1}^n (\|x\|^2-2\langle x,a_i \rangle+\|a_i\|^2)=\dots $$ (I can write the details if you want, just ask)