How can you explain the Singular Value Decomposition to non-specialists?

In two days, I am giving a presentation about a search engine I have been making the past summer. My research involved the use of singular value decompositions, i.e., $A = U \Sigma V^T$. I took a high school course on Linear Algebra last year, but the course was not very thorough, and though I know how to find the SVD of a matrix, I don't know how to explain what I have in my hands after the matrix has been decomposed.

To someone who has taken linear algebra, I can say that I can decompose a matrix $A$ into matrix $\Sigma$, whose diagonal holds the singular values, and matrices $U$ and $V$ whose columns represent the left and right singular vectors of matrix $A$. I am not sure how to explain what a singular value or what left/right singular vectors are. I can still be satisfied if there is no easy way to explain what this decomposition means, but I always prefer keeping the audience as informed as possible.


Solution 1:

Much of linear algebra is about linear operators, that is, linear transformations of one space to itself. A typical result is that by choosing a suitable basis for the space, the operator can be expressed in a simple matrix form, for instance, diagonal. However, this does not apply to all operators.

The singular value decomposition is the only main result about linear transformations between two different spaces. It says that by choosing suitable bases for the spaces, the transformation can be expressed in a simple matrix form, a diagonal matrix. And this works for all linear transformations. Moreover, the bases are very nice: orthogonal bases.

Geometrically, the SVD means that spheres of the proper dimension in the domain are transformed into ellipsoids in the codomain. Since the transformation may not be injective, the dimension of the ellipsoid is at most the dimension of the sphere. So you get some distortion along some axes and some collapsing along other axes. And that is all the transformation does. Every linear transformation.

Solution 2:

This answer attempts to give a simple algebraic intuition. Suppose $A$ is an $m \times n$ real matrix. Let $A=U\Sigma V^T$ be the SVD of $A$. Suppose that the rank of $A$ is equal to $r.$ Then the first $r$ singular values will be non-zero, while the remaining singular values will be zero.

If we write $U=[u_1 | \cdots | u_n]$ and $V=[v_1| \cdots | v_m]$, where $u_i$ is the $i^{th}$ column of $U$ (and similarly for $v_j$), then $A= \sum_{i=1}^r \sigma_i u_i v_i^T$, where $\sigma_i$ is the $i^{th}$ singular value. This shows that the linear transformation $A$ can be decomposed into the weighted sum of the linear transformations $u_i v_i^T$, each of which has rank $1$.

A large singular value $\sigma_k$ will indicate that the contribution of the corresponding transformation $u_k v_k^T$ is large and a small singular value will indicate that the corresponding contribution to the action of $A$ is small. As an application of this intuition, there are cases where e.g. $A$ is a full rank square matrix, hence it has no zero singular values, however a threshold is chosen and all terms in the sum $A= \sum_{i=1}^r \sigma_i u_i v_i^T$ corresponding to singular values less than this threshold are discarded. In that way, $A$ is approximated by a simpler matrix $\tilde{A}$, whose behavior is, for practical purposes, essentially the same as that of the original matrix.

It might also help to visualize the action of $A$ on a vector $x$ by means of the above formula: $Ax = \sum_{i=1}^r (\sigma_i\langle v_i,x\rangle) u_i $. Observe that the image of $x$ is a linear combination of the vectors $u_i$ and the coefficients depend on both the magnitude of the corresponding singular values as well as the directions of the vectors $v_i$ with respect to $x$. For example, if $x$ is orthogonal to all the $v_i$ for $i$ such that $\sigma_i \neq 0$, then $Ax=0$. On the other hand, if $x=v_k$ for some $k$ such that $\sigma_k \neq 0$, then $Av_k = \sigma_k u_k$.

Solution 3:

  1. One way to explain the derivation of the SVD, as least to an audience that already knows some linear algebra, is that the SVD of a matrix $A$ (which may be non-square) comes from diagonalizing the square, Hermitian matrix $$ \begin{bmatrix} O & A \\ A^{\ast} & O \end{bmatrix}, $$ where $O$ denotes a zero matrix of the appropriate size. Then the eigenvectors of this extended matrix are given by the pair $(u_i, \bar{v}_i)$, with eigenvalues $\sigma_i$, so $u_i$ and $v_i$ are your left and right singular vectors, and this forms the SVD $A = U \Sigma V^{\ast}$. In particular, the singular vectors must obey the equations $$ Av_i = \sigma_i u_i, \quad A^{\ast} u_i = \sigma_i v_i$$

  2. If you want to explain what the SVD means intuitively, then I think the best way to explain is to think of the singular vectors $u,v$ as the basis orthogonalizing the domain and codomain of the linear transformation, respectively, so that it lines up with how the matrix $A$ "achieves" its rank. The largest singular value, $\sigma_1$, corresponds to the optimal way to approximate the behavior of $A$ by a rank-1 matrix, and this behavior is precisely given by $Av_1 = \sigma_1 u_1$. Similarly, if you want to approximate $A$ by a rank-2 matrix, then the underlying behavior is given by $$A(c_1 v_1 + c_2 v_2) = c_1 \sigma_1 u_1 + c_2 \sigma_2 u_2.$$ In general, using the $k$ largest singular values and corresponding singular vectors gives you a way to best explain the behavior of $A$ using only a rank $k$ operator. Furthermore, the size of the singular values tells you how $A$ "expands" lengths along different directions.