When is $\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I}$ invertible?

The question is quite simple: for a $N \times p$ matrix $\mathbf{X}$ with real entries, when is $\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I}$ invertible (where $\mathbf{I}$ is the $p \times p$ identity matrix and $\lambda > 0$)?

This comes up in ridge regression. In Elements of Statistical Learning (Hastie et al.),

[The equation] adds a positive constant to the diagonal of $\mathbf{X}^{T}\mathbf{X}$ before inversion. This makes the problem nonsingular, even if $\mathbf{X}^{T}\mathbf{X}$ is not of full rank.

I know that $\mathbf{X}^{T}\mathbf{X}$ is invertible if and only if it is of full rank if and only if $\mathbf{X}$ is of full column rank. The explanation is quite intuitive, but how do I prove it?


Solution 1:

$X^TX+\lambda I$ is always invertible, if $\lambda>0$.

Proof. Note that, if $u\in\mathbb R^p\setminus\{0\}$, then $$ \langle(X^TX+\lambda I)u,u\rangle =\lambda\langle u,u\rangle+\langle X^T Xu,u\rangle = \lambda\langle u,u\rangle+\langle Xu,Xu\rangle \ge \lambda\langle u,u\rangle>0. $$ Hence, $(X^TX+\lambda I)u\ne 0$, for all $u\in\mathbb R^p\setminus\{0\}$, and thus it is invertible.

Note. By $\langle\cdot,\cdot\rangle$ we denote the inner product in $\mathbb R^p$, and we have used the fact that $$ \langle Ax,y\rangle=\langle x,A^Ty\rangle,\quad\text{for all $x,y\in\mathbb R^p$}. $$ Also, note that $X^TX+\lambda I$ is invertible, for $\lambda>0$, even when $X$ is not a square matrix!