Difference between least squares and minimum norm solution

Consider a linear system of equations $Ax = b$.

  • If the system is overdetermined, the least squares (approximate) solution minimizes $||b - Ax||^2$. Some source sources also mention $||b - Ax||$.

  • If the system is underdetermined one can calculate the minimum norm solution. But it does also minimize $||b - Ax||$, or am I wrong?

But if least squares is also a minimum norm, what is the difference, or the rationale of the different naming?


Linear system $$ \mathbf{A} x = b $$ where $\mathbf{A}\in\mathbb{C}^{m\times n}_{\rho}$, and the data vector $b\in\mathbf{C}^{n}$.

Least squares problem

Provided that $b\notin\color{red}{\mathcal{N}\left( \mathbf{A}^{*}\right)}$, a least squares solution exists and is defined by $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert \mathbf{A} x - b \rVert_{2}^{2} \text{ is minimized} \right\} $$

Least squares solution

The minimizers are the affine set computed by $$ x_{LS} = \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y}, \quad y \in \mathbb{C}^{n} \tag{1} $$ where vectors are colored according to whether they reside in a $\color{blue}{range}$ space or $\color{red}{null}$ space.

The red dashed line is the set of the least squares minimizers.

hyperplane

Least squares solution of minimum norm

To find the minimizers of the minimum norm, the shortest solution vector, compute the length of the solution vectors.

$$ % \lVert x_{LS} \rVert_{2}^{2} = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} \Big\rVert_{2}^{2} + \Big\lVert \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % $$ The $\color{blue}{range}$ space component is fixed, but we can control the $\color{red}{null}$ space vector. In fact, chose the vector $y$ which forces this term to $0$.

Therefore, the least squares solution of minimum norm is $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b}. $$ This is the point where the red dashed line punctures the blue plane. The least squares solution of minimum length is the point in $\color{blue}{\mathcal{R}\left( \mathbf{A}^{*}\right)}$.

Full column rank

You ask about the case of full column rank where $n=\rho$. In this case, $$ \color{red}{\mathcal{N}\left( \mathbf{A} \right)} = \left\{ \mathbf{0} \right\}, $$ the null space is trivial. There is no null space component, and the least squares solution is a point.

In other words, $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b} $$ is always the least squares solution of minimum norm. When the matrix has full column rank, there is no other component to the solution. When the matrix is column rank deficient, the least squares solution is a line.


First, it's important to understand that there are different norms. Most likely you're interested in the euclidean norm:

$\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$

Next, note that minimizing $\| b-Ax \|_{2}^{2}$ is equivalent to minimizing $\| b-Ax \|_{2}$, because squaring the norm is a monotone transform. It really doesn't matter which one you minimize.

If $A$ has full column rank, then there is a unique least squares solution.

However, if $A$ doesn't have full column rank, there may be infinitely many least squares solutions. In this case, we're often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $\| x \|_{2}$. The minimum norm least squares solution is always unique. It can be found using the singular value decomposition and/or the Moore-Penrose pseudoinverse.