Why is SVD on $X$ preferred to eigendecomposition of $XX^\top$ in PCA?

Solution 1:

Alright, to further explain Läuchli's example in detail:

The precise problem with forming the cross-product matrix in this case is that you have effectively made a singular matrix out of your starting matrix.

In exact arithmetic, the cross-product matrix of the Läuchli matrix should have looked like this:

$$\begin{pmatrix}1+\varepsilon^2&1&1\\1&1+\varepsilon^2&1\\1&1&1+\varepsilon^2\end{pmatrix}$$

Its exact eigenvalues are $(3+\varepsilon^2,\varepsilon^2,\varepsilon^2)$. The trouble with inexact arithmetic is that if $\varepsilon^2$ is tiny, then $1+\varepsilon^2$ is the same as $1$, and you now have the spectrum $(3,0,0)$. You do think there's a difference between $\bf \varepsilon^2$ and $\bf 0$, don't you?

The good thing, then, about direct SVD algorithms is that it is able to compute these tiny singular values accurately, much better than taking the eigensystem of the cross-product matrix. Here's a Mathematica comparison:

lauchli = With[{ε = 1*^-20},
               N[{{1, 1, 1}, {ε, 0, 0}, {0, ε, 0}, {0, 0, ε}}]];

SingularValueList[lauchli, Tolerance -> 0]
   {1.73205, 1.*10^-20, 1.*10^-20}

Sqrt[Eigenvalues[Transpose[lauchli].lauchli]]
   {1.73205, 0. + 1.82501*10^-8 I, 0.}

(A similar demonstration could of course be done in MATLAB; luckily this example is built-in there, as gallery('lauchli', n))

Now, we even have a spurious (but tiny) imaginary component in the second answer; neither of the last two results matches the first.

That's why the SVD is vastly preferred.