Proof of convexity of linear least squares

Another way to prove that a function is convex is by showing that the second order derivative (if it exists) is positive semi-definite.

$$ \phi: \beta \mapsto \Vert y - X \beta \Vert^2 = \Vert y \Vert^2 - 2 y^T X \beta + \Vert X \beta \Vert^2$$ $\phi$ is twice differentiable and the second derivative (i.e. the Hessian) is

$$ \dfrac {\partial \phi} {\partial \beta} = - 2y^TX + 2(X\beta)^TX =- 2y^TX + 2\beta^TX^TX $$

$$ \dfrac {\partial^2 \phi} {\partial \beta \partial \beta^T} = 2X^TX$$

which is a positive semi-definite matrix. Therefore, $\phi$ is a convex function.


You want a proof that the function $$ \phi: \beta \mapsto \Vert y - X \beta \Vert^2 = \Vert y \Vert^2 - 2 y^T X \beta + \beta^T X^T X \beta $$ is convex, right? (here $\beta$ and $y$ are vectors and $X$ is a matrix). In other words, you need to prove that $$ \phi(t \beta_1 + (1-t) \beta_2) - \left[ t \phi( \beta_1) + (1-t) \phi(\beta_2) \right] \leq 0 $$ for all $\beta_1, \beta_2$ and $t \in [0,1]$. After calculation, the left-hand term becomes $$ t^2 \beta_1^T X^T X \beta_1 + (1-t)^2 \beta_2^T X^T X \beta_2 + 2 t(1-t) \beta_1^T X^T X \beta_2 - t \beta_1^T X^T X \beta_1 - (1-t) \beta_2^T X^T X \beta_2 $$ $$ = - t(1-t) \left[ (\beta_1 - \beta_2)^T X^T X (\beta_1 - \beta_2) \right] = - t(1-t) \Vert X (\beta_1 - \beta_2) \Vert^2$$ which is clearly $\leq 0$.


I decided to make my comment an answer since I think that a few of these are a little complicated due to the careful bookkeeping for derivatives. We say a function $f$ is convex if for any $0 \le \gamma \le 1$ and $x, y \in \mathbb{R}^n$ we have that $$ f(\gamma x + (1-\gamma)y) \le \gamma f(x) + (1-\gamma)f(y). $$

We can now prove it in three (simple!) steps, which I will prove here for the sake of convenience, but these are extraordinarily standard one-line-proofs.

  1. First, it's clear that the function $f(x) = x^2$ is convex (prove this in your favorite way).

  2. Now, for any two convex functions $f, g: \mathbb{R}^n \to \mathbb{R}$, their sum is convex since, simply applying the definitions, $$ f(\gamma x + (1-\gamma)y) + g(\gamma x + (1-\gamma)y) \le \gamma (f(x) + g(x)) + (1-\gamma) (f(y) + g(y)). $$

  3. Affine precomposition preserves convexity, since if $f$ is convex, then $$ f(A(\gamma x + (1-\gamma)y) + b) =f(\gamma Ax + (1-\gamma)Ay + (\gamma + (1-\gamma)) b) \le \gamma f(Ax + b) + (1-\gamma)f(Ay+b), $$

so, since the function $$ f(x) = \|Ax - b\|_2^2 = \sum_i (a_i^Tx - b_i)^2, $$ is the sum of a bunch of convex functions precomposed with affine functions, then $f$ is convex.