Why is SVD on $X$ preferred to eigendecomposition of $XX^\top$ in PCA?
Solution 1:
Alright, to further explain Läuchli's example in detail:
The precise problem with forming the cross-product matrix in this case is that you have effectively made a singular matrix out of your starting matrix.
In exact arithmetic, the cross-product matrix of the Läuchli matrix should have looked like this:
$$\begin{pmatrix}1+\varepsilon^2&1&1\\1&1+\varepsilon^2&1\\1&1&1+\varepsilon^2\end{pmatrix}$$
Its exact eigenvalues are $(3+\varepsilon^2,\varepsilon^2,\varepsilon^2)$. The trouble with inexact arithmetic is that if $\varepsilon^2$ is tiny, then $1+\varepsilon^2$ is the same as $1$, and you now have the spectrum $(3,0,0)$. You do think there's a difference between $\bf \varepsilon^2$ and $\bf 0$, don't you?
The good thing, then, about direct SVD algorithms is that it is able to compute these tiny singular values accurately, much better than taking the eigensystem of the cross-product matrix. Here's a Mathematica comparison:
lauchli = With[{ε = 1*^-20},
N[{{1, 1, 1}, {ε, 0, 0}, {0, ε, 0}, {0, 0, ε}}]];
SingularValueList[lauchli, Tolerance -> 0]
{1.73205, 1.*10^-20, 1.*10^-20}
Sqrt[Eigenvalues[Transpose[lauchli].lauchli]]
{1.73205, 0. + 1.82501*10^-8 I, 0.}
(A similar demonstration could of course be done in MATLAB; luckily this example is built-in there, as gallery('lauchli', n)
)
Now, we even have a spurious (but tiny) imaginary component in the second answer; neither of the last two results matches the first.
That's why the SVD is vastly preferred.