Understanding a derivation of the SVD

Here's an attempt to motivate the SVD. Let $A \in \mathbb R^{m \times n}$. It's natural to ask, in what direction does $A$ have the most "impact". In other words, for which unit vector $v$ is $\| A v \|_2$ the largest? Denote this unit vector as $v_1$. Let $\sigma_1 = \| A v_1 \|_2$, and define $u_1$ by $A v_1 = \sigma_1 u_1$.

Next, we would like to know in what direction orthogonal to $v_1$ does $A$ have the most "impact"? In other words, for which unit vector $v \perp v_1$ is $\| A v \|_2$ the largest? Denote this unit vector as $v_2$. Let $\sigma_2 = \| A v_2 \|_2$, and define $u_2$ by $A v_2 = \sigma_2 u_2$.

Question: Are the vectors $u_1$ and $u_2$ guaranteed to be orthogonal? If so, is there an easy proof for this fact, or a viewpoint that makes this obvious?


Solution 1:

if $$\max_{\|v\|=1} \|A v\|$$ has $v_1$ as solution, then for every $v \perp v_1$ : $Av \perp A v_1$.

suppose by contradiction that it exists $v_2 \perp v_1$ such that $Av_2 = c A v_1 + u_2$ where $c\ne 0$ and $u_2 \perp A v_1$. then $v_1$ can't be a maximiser of $\max_{\|v\|=1} \|A v\|$ :

let $w(\epsilon) = \sqrt{1-\epsilon^2} v_1 + \epsilon v_2$, hence $\|w(\epsilon)\| = 1$, and $$A w(\epsilon) = \sqrt{1-\epsilon^2} A v_1 + \epsilon A v_2 = (\sqrt{1-\epsilon^2} + \epsilon c) A v_1 + \epsilon u_2$$

i.e. : $$\|A w(\epsilon)\|^2 = \underbrace{|\sqrt{1-\epsilon^2} + \epsilon c|^2}_{>\ 1 \ \text{if } \ \epsilon \ \text{ is small enough }} \|A v_1\|^2+ \epsilon^2 \|u_2\|^2 $$

this is enough to prove the SVD of matrices, since we can repeatedly compute $\max_{\|v\|=1} \|A_k v\|$ on $A_1 = A$ and then on $A_{k+1} = A_{k} - A_{k}v_{k} v_{k}^T$ where $v_{k}$ is the maximiser of the previous maximisation, and hence this is enough to prove the spectral theorem too.

Solution 2:

The shortcut that you might be inclined to take would be to try to prove that if $x,y$ are orthogonal then $Ax,Ay$ are orthogonal. But this is not the case, and in fact it is very far from being the case. It is not even the case when $A$ is symmetric positive definite, or even just diagonal with positive entries.

To proceed correctly you have to bring in the adjoint somehow. A way to do this is to note that $\| Ax \|^2_2 = \langle Ax,Ax \rangle = \langle x,A^T A x \rangle$ (the transpose is the adjoint for the Euclidean inner product). So the norm is maximized when you maximize this inner product. Cauchy-Schwarz tells us that it is maximized by the eigenvector of $A^T A$ with largest eigenvalue. This is your $v_1$, and its image is your $u_1$.

Now you take the orthogonal complement of $v_1$ when you go to define $v_2$. I don't see a way to conveniently work with this orthogonal complement without proving that it is invariant under $A^T A$, because otherwise $A^T A$ might not have any eigenvectors in there. (For example, with $B=\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$, this construction works to find the eigenvector $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$, but then the orthogonal complement of that is not invariant under $B$, and indeed $B$ doesn't have any other eigenvectors.) But once you've proven that, you've almost proven the spectral theorem, so you've gotten pretty far from the simple picture that you started with.