What is the intuitive relationship between SVD and PCA?

Singular value decomposition (SVD) and principal component analysis (PCA) are two eigenvalue methods used to reduce a high-dimensional data set into fewer dimensions while retaining important information. Online articles say that these methods are 'related' but never specify the exact relation.

What is the intuitive relationship between PCA and SVD? As PCA uses the SVD in its calculation, clearly there is some 'extra' analysis done. What does PCA 'pay attention' to differently than the SVD? What kinds of relationships do each method utilize more in their calculations? Is one method 'blind' to a certain type of data that the other is not?

(I assume for the purposes of this answer that the data has been preprocessed to have zero mean.)

Simply put, the PCA viewpoint requires that one compute the eigenvalues and eigenvectors of the covariance matrix, which is the product $\frac{1}{n-1}\mathbf X\mathbf X^\top$, where $\mathbf X$ is the data matrix. Since the covariance matrix is symmetric, the matrix is diagonalizable, and the eigenvectors can be normalized such that they are orthonormal:

$\frac{1}{n-1}\mathbf X\mathbf X^\top=\frac{1}{n-1}\mathbf W\mathbf D\mathbf W^\top$

On the other hand, applying SVD to the data matrix $\mathbf X$ as follows:

$\mathbf X=\mathbf U\mathbf \Sigma\mathbf V^\top$

and attempting to construct the covariance matrix from this decomposition gives $$ \frac{1}{n-1}\mathbf X\mathbf X^\top =\frac{1}{n-1}(\mathbf U\mathbf \Sigma\mathbf V^\top)(\mathbf U\mathbf \Sigma\mathbf V^\top)^\top = \frac{1}{n-1}(\mathbf U\mathbf \Sigma\mathbf V^\top)(\mathbf V\mathbf \Sigma\mathbf U^\top) $$

and since $\mathbf V$ is an orthogonal matrix ($\mathbf V^\top \mathbf V=\mathbf I$),

$\frac{1}{n-1}\mathbf X\mathbf X^\top=\frac{1}{n-1}\mathbf U\mathbf \Sigma^2 \mathbf U^\top$

and the correspondence is easily seen (the square roots of the eigenvalues of $\mathbf X\mathbf X^\top$ are the singular values of $\mathbf X$, etc.)

In fact, using the SVD to perform PCA makes much better sense numerically than forming the covariance matrix to begin with, since the formation of $\mathbf X\mathbf X^\top$ can cause loss of precision. This is detailed in books on numerical linear algebra, but I'll leave you with an example of a matrix that can be stable SVD'd, but forming $\mathbf X\mathbf X^\top$ can be disastrous, the Läuchli matrix:

$\begin{pmatrix}1&1&1\\ \epsilon&0&0\\0&\epsilon&0\\0&0&\epsilon\end{pmatrix}^\top,$

where $\epsilon$ is a tiny number.

A tutorial on Principal Component Analysis by Jonathon Shlens is a good tutorial on PCA and its relation to SVD. Specifically, section VI: A More General Solution Using SVD.

What is the intuitive relationship between SVD and PCA?

Related

Recent Posts