Pseudo-inverse of a matrix that is neither fat nor tall?

Given a matrix $A\in\mathbb R^{m\times n}$, let us define:

  • $A$ is a fat matrix if $m\le n$ and $\text{null}(A^T)=\{0\}$
  • $A$ is a tall matrix is $m\ge n$ and $\text{range}(A)=\mathbb R^n$

Using the finite rank lemma, we can find that:

  • When $A$ is a fat matrix, its (right) pseudo-inverse is $A^\dagger = A^T(AA^T)^{-1}$
  • When $A$ is a tall matrix, its (left) pseudo-inverse is $A^\ddagger = (A^TA)^{-1}A^T$

My question is what is the pseudo-inverse when $A$ is neither fat nor tall (in the sense of the above definitions), i.e. it is a matrix such that $\text{null}(A^T)\ne \{0\}$ (i.e. the null space is non-trivial) and $\text{range}(A)\ne\mathbb R^n$? An example of such a matrix is:

$$ A = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \\ 0 & 3 & 0 \end{bmatrix} $$

which clearly does not map to full $\mathbb R^4$ and whose null space is $\text{span}\left\{\begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix}\right\}$.


There is no need to study both the fat matrix and the tall matrix. Let $B = A^t$. In that case, the tall matrix problem for $B$ is the fat matrix problem for $A$ and vice versa.

Thus, we consider only the tall matrix case. Assume that the rank of $A$ is $r$ where $r \leq n$. Let the SVD (singular value decomposition) of $A$ is defined as $A = U \Sigma V^t$ where $U$ is an $m \times r$ matrix with orthogonal columns, $V$ is an $r \times n$ matrix with orthogonal columns and the diagonal $r \times r$ matrix $\Sigma$ contains the $r$ singular values in the diagonal. These diagonal values $\sigma_i$, $i=1 \colon r$ are ordered in the non-increasing order.

The Moore-Penrose pseudo inverse is then given by $$ A^{+} = V \Sigma^{-1} U^t . $$ This pseudo inverse exists even when $A^tA$ is singular.

Numerical analysts will tell you to define the "zero singular values" as the singular values which are "tiny". Tiny means any number which is in the same order as $\|A\|*\epsilon$ {(any) norm of $A$ times the machine precision}. Unfortunately, the Moore-Penrose inverse often depends on the way "tiny" is defined. Mathematicians do not have this problem since zero means zero and nothing else.

For your matrix $A$, there are two non-zero singular values (3.7519 and 0.9610) and hence $r=2$. Since there are no tiny singular values, the Moore-Penrose inverse is well defined.


Fundamental Theorem of Linear Algebra

The matrix $\mathbf{A} \in \mathbb{C}^{m\times n}$ induces the four fundamental subspaces. In finite dimension, the domain $\mathbf{C}^{n}$, and codomain, $\mathbf{C}^{m}$ are $$ \begin{align} % \mathbf{C}^{n} = \color{blue}{\mathcal{R} \left( \mathbf{A} \right)} \oplus \color{red}{\mathcal{N} \left( \mathbf{A}^{*} \right)} \\ % \mathbf{C}^{m} = \color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)} \oplus \color{red} {\mathcal{N} \left( \mathbf{A} \right)} % \end{align} $$

Least squares

Consider a data vector $b \in \mathbb{C}^{m}$ that does not lie in the null space, that is, $b \notin \color{red} {\mathcal{N} \left( \mathbf{A} \right)}$. We are solving for the least squares minimizers defined as $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert \mathbf{A} x_{LS} - b \rVert_{2}^{2} \text{ is minimized} \right\}. $$ These solutions are $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger} b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right) y}, \quad y\in\mathbb{C}^{n} $$ The Moore-Penrose pseudoinverse matrix is $$ \begin{align} \mathbf{A}^{\dagger} &= \mathbf{V} \, \Sigma^{\dagger} \mathbf{U}^{*} \\ % &= % V \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}}} & \color{red}{\mathbf{V}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} & \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] % \end{align} $$

SVD General case

The singular value decomposition provides an orthonormal basis for the four subspaces. The range spaces are aligned and the difference in length scales is expressed in the singular values.

The decomposition for a matrix with rank $\rho$ is $$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccccc} \sigma_{1} & 0 & \dots & & & \dots & 0 \\ 0 & \sigma_{2} \\ \vdots && \ddots \\ & & & \sigma_{\rho} \\ & & & & 0 & \\ \vdots &&&&&\ddots \\ 0 & & & & & & 0 \\ \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] \\ % & = % U \left[ \begin{array}{cccccccc} \color{blue}{u_{1}} & \dots & \color{blue}{u_{\rho}} & \color{red}{u_{\rho+1}} & \dots & \color{red}{u_{n}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{v_{1}^{*}} \\ \vdots \\ \color{blue}{v_{\rho}^{*}} \\ \color{red}{v_{\rho+1}^{*}} \\ \vdots \\ \color{red}{v_{n}^{*}} \end{array} \right] % \end{align} $$ Let's look at your special cases.

Tall: $m>n$

The overdetermined case of full column rank $\rho = n$.

$$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{c} \mathbf{S}_{\rho\times \rho} \\ \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] \\ % \end{align} $$ $$ \begin{align} \mathbf{A}^{\dagger} &= \mathbf{V} \, \Sigma^{\dagger} \mathbf{U}^{*} \\ % &= % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho}^{-1} & \mathbf{0} \end{array} \right] % U \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\ \color{red} {\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] % \end{align} $$

Wide: $m<n$

The underdetermined case of full row rank $\rho = m$.

$$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red} {\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] \\ % \end{align} $$ $$ \begin{align} \mathbf{A}^{\dagger} &= \mathbf{V} \, \Sigma^{\dagger} \mathbf{U}^{*} \\ % &= % V \left[ \begin{array}{cc} \color{blue}{\mathbf{V}_{\mathcal{R}}} & \color{red} {\mathbf{V}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho}^{-1} & \mathbf{0} \end{array} \right] % U \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \end{array} \right] % \end{align} $$

Square: $m=n$

The nonsingular case $\rho = m = n$.

$$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}} \end{array} \right] % Sigma \left[ \begin{array}{c} \mathbf{S} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \end{array} \right] % \end{align} $$ $$ \begin{align} \mathbf{A}^{\dagger} &= \mathbf{V} \, \Sigma^{\dagger} \mathbf{U}^{*} \\ % &= % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] % Sigma \left[ \begin{array}{c} \mathbf{S}^{-1} \end{array} \right] % U \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \end{array} \right] % \end{align} $$


Original example

@Vini presents the solution clearly. To emphasize his conclusions, here are the background computations. The product matrix is $$ \mathbf{A}^{\mathrm{T}}\mathbf{A} = \left[ \begin{array}{ccc} 1 & 1 & 0 \\ 1 & 14 & 0 \\ 0 & 0 & 0 \\ \end{array} \right], $$ which quickly leads to the characteristic polynomial $$ p(\lambda) = \left( -\lambda^{2} + 15 \lambda-13\right) \lambda. $$ The eigenvalue spectrum is then $$ \lambda \left( \mathbf{A}^{\mathrm{T}}\mathbf{A} \right) = \left\{ \frac{1}{2} \left( 15 \pm \sqrt{173} \right), 0 \right\}. $$ The singular values are the square root of the nonzero eigenvalues $$ \left\{ \sigma_{1}, \sigma_{2} \right\} = \left\{ \sqrt{\frac{1}{2} \left( 15 + \sqrt{173} \right)}, \sqrt{\frac{1}{2} \left( 15 -\sqrt{173} \right)} \right\} $$ There are two nonzero singular values, therefore the matrix rank is $\rho = 2$.

Let $$ \mathbf{S} = \left( \begin{array}{cc} \sigma_{1} & 0 \\ 0 & \sigma_{2} \\ \end{array} \right). $$ This diagonal matrix is embedded in the sabot matrix, padded with zeros to insure conformability between the matrices $\mathbf{U}$ and $\mathbf{V}$: $$ \Sigma = \left[ \begin{array}{ccc} \sqrt{\frac{1}{2} \left(\sqrt{173}+15\right)} & 0 & 0 \\ 0 & \sqrt{\frac{1}{2} \left(15-\sqrt{173}\right)} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} \right] = \left( \begin{array}{cc} \mathbf{S} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \\ \end{array} \right). $$ The pseudoinverse of the $\Sigma$ matrix is $$ \Sigma^{\dagger} = \left[ \begin{array}{ccc} \sqrt{\frac{1}{2} \left(\sqrt{173}+15\right)}^{-1} & 0 & 0 \\ 0 & \sqrt{\frac{1}{2} \left(15-\sqrt{173}\right)}^{-1} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} \right] = \left( \begin{array}{cc} \mathbf{S}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \\ \end{array} \right). $$ In this case where $m=n$, the matrix products are identical: $$ \Sigma^{\dagger}\Sigma = \Sigma\, \Sigma^{\dagger} = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} \right] = \left[ \begin{array}{cc} \mathbf{I}_{2} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \\ \end{array} \right]. $$

Another example of $\Sigma$ gymnastics is in here: Singular Value Decomposition: Prove that singular values of A are square roots of eigenvalues of both $AA^{T}$ and $A^{T}A$

The eigenvectors of the product matrix are $$ % v_{1} = \left[ \begin{array}{c} \frac{1}{2} \left(-13+\sqrt{173}-13\right) \\ 1 \\ 0 \end{array} \right], \quad % v_{2} = \left[ \begin{array}{c} \frac{1}{2} \left(-13-\sqrt{173}\right) \\ 1 \\ 0 \end{array} \right], \quad % $$ The normalized form constitute the column vectors of the domain matrix $$ \mathbf{V} = % \left[ \begin{array}{cc} % \left( 1 + \left(\frac{-13 + \sqrt{173}}{4}\right)^2 \right)^{-\frac{1}{2}} % \left[ \begin{array}{c} \frac{1}{2} \left(-13+\sqrt{173}\right) \\ 1 \\ 0 \end{array} \right] & % \left( 1 + \left(\frac{13 + \sqrt{173}}{4}\right)^2 \right)^{-\frac{1}{2}} % \left[ \begin{array}{c} \frac{1}{2} \left(-13-\sqrt{173}\right) \\ 1 \\ 0 \end{array} \right] % \end{array} \right] $$