Why does SVD provide the least squares and least norm solution to $ A x = b $?

I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=U\Sigma V^T$. Then we calculate $x'= V \Sigma^{+}U^Tb$. $\Sigma^{+}$ has the reciprocals ($\dfrac{1}{\sigma_i}$) of the singular values in its diagonal and zeros where $\sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution that has the minimum norm (closest to origin). If it is not in the range, then it is the least-squares solution.

I fail to see how exactly this procedure always produces a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least-squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?


First, consider the problem $\Sigma x = b$, where $$ \Sigma = \pmatrix{\sigma_1\\& \ddots\\&&\sigma_r\\ &&&0\\&&&&\ddots\\&&&&&0} $$ Note that $b$ is only in the range of $\Sigma$ if its entries $b_{r+1},\dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be $$ x = (b_1/\sigma_1,\dots,b_r/\sigma_r,0,\dots,0)^T = \Sigma^+ b $$ From there, note that $$ U\Sigma V^T x = b \implies\\ \Sigma (V^T x ) = U^T b $$ By the above argument, the least squares solution for $(V^T x)$ is given by $V^T x = \Sigma^+ U^T b$. Noting that $\|V^T x\| = \|x\|$, we can use this to conclude that $x = (V \Sigma ^+ U^T)b$ must be the least squares solution (for $x$).

I hope you find this explanation sufficient.


The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.

Given $\mathbf{A}x=b$, where the data vector $b\notin\mathcal{N}\left( \mathbf{A}^{*} \right)$, the least squares solution exists and is given by $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y}, \quad y\in\mathbb{C}^{n} $$ where blue vectors are in the range space $\color{blue}{\mathcal{R}\left( \mathbf{A}^{*} \right)}$ and red vectors are in the null space $\color{red}{\mathcal{N}\left( \mathbf{A} \right)}.$ The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is $$ \lVert \mathbf{A} x_{LS} (y) \rVert_{2}^{2} = r^{2}_{min} $$ for all values of $y$.

What is the vector in this affine space with the smallest length? The length of the solution vectors is $$ \lVert x_{LS} \rVert_{2}^{2} = \lVert \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y} \rVert_{2}^{2} = \lVert \color{blue}{\mathbf{A}^{\dagger}b} \rVert_{2}^{2} + \underbrace{\lVert \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y} \rVert_{2}^{2}}_{y=\mathbf{0}} $$ The solution vector of minimum length is $\color{blue}{\mathbf{A}^{\dagger}b}$, the point in the affine space closest to the origin.