Formal proof that mean minimize squared error function
On an important book of Machine Learning, I've found this proof.
We want to minimize the cost function $J_0(X_0)$ defined by the formula $$J_0(x_0) = \sum_{k=1}^n \|x_0 - x_k \|^2.$$
The solution to this problem is given by $x_0=m$, where $m$ is the sample mean $m = \frac{1}{n}\sum_{k=1}^nx_k$.
Proof. $$ \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - m)-(x_k - m) \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 -2(x_0-m)^T\sum_{k=1}^n(x_k-m) + \sum_{k=1}^n \|x_k - m \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2. \end{array}$$
Since $ \sum_{k=1}^n \|x_k - m \|^2 $ is independent of $x_0$, this expression is obviously minimized by $x_0=m$.
I cannot understand this proof. What does assure that $ \sum_{k=1}^n \|x_k - m \|^2 $ is minimized?
Assume $X_0$ is a space of "values" to optimize, and so the values are constant. Note that $\sum_{k=1}^n \|x_k− m\|^2$ is constant because it does not depend of $x_0$ ($x_k$ and $m$ are calculated from $X_0$). So in the last line of the proof you have $J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2$, i.e., $$J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + C,$$ for some non-negative integer $C \ge 0$. Thus, you need to minimize $\sum_{k=1}^n \|x_0 - m \|^2$. Clearly, it is minimized when $x_0 = m$.
Edit 1. The term $2(x_0-m)^T\sum_{k=1}^n(x_k-m)$ vanishes because
$$\sum_{k=1}^n (x_k - m) = \sum_{k=1}^n x_k - \sum_{k=1}^n m = \sum_{k=1}^n x_k - m \times n.$$
Since $m = \frac{1}{n} \sum_{k=1}^n x_k$ we have
$$
\sum_{k=1}^n x_k = m \times n.
$$
Thus
$$
\begin{array}{rcl}
2(x_0-m)^T\sum_{k=1}^n(x_k-m) & = & 2(x_0-m)^T(m \times n - m \times n) \\
& = & 2(x_0-m)^T \times 0 \\
& = & 0.
\end{array}
$$
Edit 2. If $x_0 = v$ and $v \ne m$ we obtain the same.
$$
\begin{array}{rcl}
J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - v)-(x_k - v) \|^2 \\
& = & \sum_{k=1}^n \|x_0 - v \|^2 -2(x_0-v)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - v \|^2.
\end{array}
$$
The cross term vanishes when $x_0 = v$, so
\begin{array}{rcl}
J_0(x_0) & = & \sum_{k=1}^n \|x_0 - x_0 \|^2 -2(x_0-x_0)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - x_0 \|^2 \\
& = & 0 + 0 + \sum_{k=1}^n \|x_k - x_0 \|^2 \\
& = & \sum_{k=1}^n \|x_k - x_0 \|^2 \\
& = & \sum_{k=1}^n \|x_0 - x_k \|^2 .
\end{array}
Your proof is a proof of existence. This type of proofs can be done picking some value $m$ and proving that it satisfies the claim, but it does not prove the uniqueness, so one can imagine that there is another value which satisfies the claim. Your proof does not prove the uniqueness, maybe because this is "clearly".
Aditional. Simplifying your problem, I will assume that $X_0$ is a scalar space (a collection of numbers), to give a alternative proof.
If we have $J_0(x_0) = \sum_{k=1}^n (x_0 - x_k)^2$, to minimize we derive respect to $x_0$, thus
$$ \begin{array}{rcl} \frac{\partial}{\partial x_0} J_0(x_0) & = & \frac{\partial}{\partial x_0} \sum_{k=1}^n (x_0 - x_k)^2 \\ & = & 2 \times \sum_{k=1}^n (x_0 - x_k) \\ & = & 2 \times \left(\sum_{k=1}^n x_0 - \sum_{k=1}^n x_k\right) \\ & = & 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right). \end{array} $$
Now to minize $J_0(x_0)$, we need that $\partial J_0(x_0) /\partial x_0 = 0$. Thus \begin{array}{rcl} 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right) & = & 0 \\ n x_0 & = & \sum_{k=1}^n x_k \\ x_0 & = & \frac{1}{n} \sum_{k=1}^n x_k \\ x_0 & = & m. \qquad\qquad \Box \end{array}
To obtain tha same result for vectors you need to use the gradient and proceed similarly. Surely, you will learn later about normal equations.
I believe that this expression is constant. The only variable anywhere in sight is $x_0$. You are supposed to pick $x_0$ that minimizes $J_0(x_0)$. The only expression involving $x_0$ is nonnegative, so it is minimized if it is zero. Choosing $x_0=m$ does this because $$J_0(x_0) = \underbrace{\sum_{k=1}^n \|x_0 - m \|^2}_{\textrm{zero when }x_0=m} + \underbrace{\sum_{k=1}^n \|x_k - m \|^2}_{\textrm{constant w.r.t. }x_0} $$
Once you understand the computation (the key relation is $\sum (x_k - m) = 0$) you are left with $$ J(x_0) = \sum_{k=1}^n |x_0 - m|^2 + \sum_{k=1}^n |x_k - m|^2 \ge \sum_{k=1}^n |x_k - m|^2 = J(m) $$
This proves that the minimum is for $x_0 = m$.
The squared error function is convex and differentiable. Hence it has a unique minimizer $\mu$ and its gradient exists. To obtain that minimum, we take the gradient of $J$ at $x$:
$\nabla J(x) = 2\sum_k (x-x_k)$
From the necessary conditions of optimality follows that the gradient at the unique minimizer $\mu$ is zero. Thus,
$\nabla J(\mu) = 2\sum_k (\mu-x_k) = 0$
Solving the above equation for $\mu$ gives the arithmetic mean:
$\quad 2\sum_k (\mu-x_k) = 0$
$\Rightarrow \sum_k \mu- \sum_k x_k = 0$
$\Rightarrow n \cdot \mu = \sum_k x_k$
$\Rightarrow \mu = \frac{1}{n}\sum_k x_k$