Intuitively, what is the difference between Eigendecomposition and Singular Value Decomposition?

Consider the eigendecomposition $A=P D P^{-1}$ and SVD $A=U \Sigma V^*$. Some key differences are as follows,

  • The vectors in the eigendecomposition matrix $P$ are not necessarily orthogonal, so the change of basis isn't a simple rotation. On the other hand, the vectors in the matrices $U$ and $V$ in the SVD are orthonormal, so they do represent rotations (and possibly flips).
  • In the SVD, the nondiagonal matrices $U$ and $V$ are not necessairily the inverse of one another. They are usually not related to each other at all. In the eigendecomposition the nondiagonal matrices $P$ and $P^{-1}$ are inverses of each other.
  • In the SVD the entries in the diagonal matrix $\Sigma$ are all real and nonnegative. In the eigendecomposition, the entries of $D$ can be any complex number - negative, positive, imaginary, whatever.
  • The SVD always exists for any sort of rectangular or square matrix, whereas the eigendecomposition can only exists for square matrices, and even among square matrices sometimes it doesn't exist.

I encourage you to see an $(m \times n)$ real-valued matrix $A$ as a bilinear operator between two spaces; intuitively, one space lies to the left ($R^m$) and the other ($R^n$) to the right of $A$. "Bilinear" simply means that $A$ is linear in both directions (left to right or right to left). The operations $A$ can perform are limited to scaling, rotation, and reflection, and combinations of these; any other kind of operation is non-linear.

$A$ transforms vectors between the two spaces via multiplication:

$x^T$ A = $y^T$ transforms left vector $x$ to right vector $y$.

$x = A y$ transforms right vector $y$ to left vector $x$.

The point of decompositions of $A$ is to identify, or highlight, aspects of the action of $A$ as an operator. The eigendecomposition of $A$ clarifies what $A$ does by finding the eigenvalues and eigenvectors that satisfy the constraint

$A x = \lambda x$.

This constraint identifies vectors (directions) $x$ that are not rotated by $A$, and the scalars $\lambda$ associated with each of those directions.

The problem with eigendecomposition is that when the matrix isn't square, the left and right space are spaces of vectors of different sizes and therefore completely different spaces; there really isn't a sense in which $A$'s action can be described as involving a "rotation", because the left and right spaces are not "oriented" relative to one another. There just isn't a way to generalize the notion of an eigendecomposition to a non-square matrix $A$.

Singular vectors provide a different way to identify vectors for which the action of $A$ is simple; one that does generalize to the case where the left and right spaces are different. A corresponding pair of singular vectors have a scalar $\sigma$ for which $A$ scales by the same amount, whether transforming from the left space to the right space or vice-versa:

$ x^T A = \sigma y^T$

$\sigma x = A y$.

Thus, eigendecomposition represents $A$ in terms of how it scales vectors it doesn't rotate, while singular value decomposition represents $A$ in terms of corresponding vectors that are scaled the same, whether moving from the left to the right space or vice-versa. When the left and right space are the same (i.e. when $A$ is square), singular value decomposition represents $A$ in terms of how it rotates and reflects vectors that $A$ and $A^T$ scale by the same amount.


Intuitively, $SVD$ says for any linear map, there is an orthonormal frame in the domain such that it is first mapped to a different orthonormal frame in the image space, and then the values are scaled.

Eigendecomposition says that there is a basis, it doesn't have to be orthonormal, such that when the matrix is applied, this basis is simply scaled. That is assuming you have $n$ linearly independent eigenvectors of course. In some cases your eigenspaces may have the linear map behave more like upper triangular matrices.

Edit: Consider the difference for a rotation matrix in $\mathbb{R}^2$.

Here, there are no real eigenvalues and this corresponds to there being no choice of basis which under the transformation is simply a scaling. On the other hand, SVD makes a lot of sense here because it says we can take the standard basis in the domain, map it to the rotated version of this basis (thought of as in the image space), and scale everything by 1.


(I'm coming back to update my answer here, because in hindsight I feel the existing ones don't quite explain the details satisfactorily, and I feel fleshing out the natural follow-ups is helpful.)


Singular Value Decomposition

There was a clear explanation for the singular value decomposition here, which I annotate here for clarity:

The SVD allows to describe the effect of a matrix $A$ on a vector (via the matrix-vector product), as a three-step process $A = U \Sigma V^\dagger$:

  1. A first rotation in the input space ($V$)
  2. A simple positive scaling that takes a vector in the input space to the output space ($\Sigma$)
  3. And another rotation in the output space ($U$)

Note that $V^\dagger$ denotes the conjugate of $V^\top$, hence the two are equal when $V$ is real.

Note that the conditions above are mathematically the following constraints:

  • $V^\dagger V = I$    (i.e. $V$ is a rotation matrix)
  • $\Sigma = \operatorname{diag}(\vec{\sigma})$ and $\vec{\sigma} \geq \vec{0}$    ("$\operatorname{diag}$" just returns a diagonal matrix with the given diagonal)
  • $U^\dagger U = I$    (i.e. $U$ is a rotation matrix)

The fundamental theorem of linear algebra says that such a decomposition always exists.

What is it used for?

Wikipedia has a nice list, but I'll list a couple. One of the most common applications is obtaining a low-rank approximation to a matrix (see PCA), which is used for compression, speed-up, and also actual data analysis. (e.g. If my memory serves right, I seem to recall reading that the Meyers-Briggs personality test was a result of running PCA on a lot of personality trait data and seeing a few "principal axes" that describe personalities.) The other one is for characterizing the pseudo-inverse for analysis or proofs, since inverting it automatically gives a formula that's the inverse when the matrix is invertible, and the pseudo-inverse when it is not.


Eigenvalue (spectral) decomposition

Similarly, for the eigendecomposition (also known as eigenvalue decomposition, spectral decomposition, or diagonalization), I would say the following:

An eigendecomposition describes the effect of a matrix $A$ on a vector as a different 3-step process $A = Q \Lambda Q^{-1}$:

  1. An invertible linear transformation ($Q^{-1}$)
  2. A scaling ($\Lambda$)
  3. The inverse of the initial transformation ($Q$)

Correspondingly, these conditions imply the following constraints:

  • $Q$ is invertible
  • $\Lambda = \operatorname{diag}(\vec{\lambda})$

This decomposition doesn't always exist, but the spectral theorem describes the conditions under which such a decomposition exists.

Note the most basic requirement is that $A$ be a square matrix (but this is not enough).

What is it used for?

Well, one thing that makes it useful is that it gives you the ability to efficiently raise a matrix to a large power: $A^n = Q \Lambda^n Q^{-1}$. For this reason (and others) it's used heavily in engineering to, say, efficiently analyze and predict the behavior of a linear dynamical system at a future point in time, since this is much faster than manually exponentiating the matrix directly. It's also used to analyze the response of a linear system at different frequencies. (Sinusoids of different frequencies are orthogonal, so you get the orthogonal diagonalizability for free.) Furthermore, it's also a problem that repeatedly comes up when solving differential equations analytically.


At this point, you probably wonder:

So when is an eigendecomposition different from an SVD?

Assuming both exist, notice that:

  • The eigenvalues (in $\Lambda$) may be negative

  • The eigenvectors (in $Q$) may be non-orthogonal

  • We usually assume $Q$ is a normal matrix since $Q^{-1}$ can cancel out the scaling, but if we don't do that, then that can also cause a mismatch.

So that means, in order for the SVD of $A$ to be equal to its eigendecomposition, we need $A$ to:

  • Have orthonormal eigenvectors (meaning all it does is to scale its input along $n$ orthogonal directions, where $A$ is $n\times n$)

  • Have positive eigenvalues (colloquially, it must be a real matrix and not "flip" anything)

It turns out that the above conditions are equivalent to the following:

  • $A$ must be a symmetric real matrix (this is equivalent to $A$ having real eigenvalues and orthonormal eigenvectors)

  • $A$'s (real) eigenvalues must be positive (copied from above; I can't think of a "nicer" equivalent)

So, we say:

The two are equal when when $A$ is symmetric positive-semidefinite.

This can be denoted as $A \succeq 0 \ \land \ A^\dagger = A$.

Bonus: This is exactly the set of matrices $A \in \mathbb{R}^{n\times n}$ that satisfy $A = B^\dagger B$ for some $B$. (Prove it!)


The SVD arises from asking in which directions a linear transformation has the greatest impact. The eigendecomposition arises from asking in which directions a quadratic form has the greatest impact.

Here is some intuition, which can be made rigorous. Let $A$ be a real $m \times n$ matrix. For which unit vector $v$ is $\| A v \|_2$ the largest? Let $$ v_1 \in \arg \max_{\|v\|_2 = 1} \, \| A v \|_2 $$ and let $$ v_2 \in \arg \max_{\| v \|_2 = 1, v \perp v_1 } \, \| A v \|_2. $$ Let $u_1 = A v_1$ and $u_2 = A v_2$. Claim: $u_2$ is orthogonal to $u_1$. Reason: If $u_2$ were not orthogonal to $u_1$, then $v_1$ would not be a maximizer because it would be possible to improve $v_1$ by perturbing it a bit in the direction $v_2$. Perturbing $v_1$ in the orthogonal direction $v_2$ does not change the norm of $v_1$, but perturbing $u_1 = A v_1$ in the non-orthogonal direction $A v_2$ does change the norm of $u_1$. (When you walk on the surface of the earth, you are moving orthogonally to the radius of the earth and your distance from the center of the earth does not change.)

Continuing as above, we discover the SVD of $A$.

Next, let $A$ be a real symmetric $n \times n$ matrix. For which unit vector $v$ is $v^T A v$ the largest? Let $$ v_1 \in \arg \max_{ \| v \|_2 = 1} \, v^T A v. $$ Claim: $A v_1$ is a scalar multiple of $v_1$. Reason: If $A v_1 = \alpha v_1 + u$, with $u \perp v_1$ and $u \neq 0$, then $v_1$ would not be a maximizer because $v_1$ could be improved by perturbing it a bit in the direction $u$. Perturbing $v_1$ in the orthogonal direction $u$ does not change the norm of $v_1$, but \begin{align} (v_1 + \epsilon u)^T A(v_1 + \epsilon u) &\approx v_1^T A v_1 + 2 \epsilon u^T A v_1. \end{align} The term $2 \epsilon u^T A v_1 \neq 0$ is a non-negligible increase in the value of $v_1^T A v_1$.

Continuing like this, we discover the eigendecomposition of $A$.


I think we can also discover the SVD by asking in which direction-pairs the bilinear form $(u,v) \mapsto u^T Av$ has the greatest impact. Let $$ (u_1,v_1) \in \arg \max_{\|u\|_2 = 1, \|v\|_2 = 1} \, u^T A v. $$ Claim: $A v_1$ is a scalar multiple of $u_1$, and $A^T u_1$ is a scalar multiple of $v_1$. Reason: Assume that either $A v_1 \neq 0$ or $A^T u_1 \neq 0$ (otherwise the claim is trivial). Suppose that $A v_1 = \alpha u_1 + u$, with $u \perp u_1$ and $u \neq 0$, and also that $A^T u_1 = \beta v_1 + v$, with $v \perp v_1$ and $v \neq 0$. Then $(u_1,v_1)$ is not a maximizer because $(u_1,v_1)$ can be improved by perturbing it a bit in the direction $(u,v)$: \begin{align} (u_1 + \delta \,u)^T A (v_1 + \epsilon v) &\approx u_1^T A v_1 + \epsilon v^T A^T u_1 + \delta \, u^T A v_1. \end{align} The term $\epsilon v^T A^T u_1 + \delta \, u^T A v_1$ is a non-negligible increase in the value of $u_1^T A v_1$ (provided that $\epsilon$ and $\delta$ are chosen to have the correct signs).

Continuing like this, we discover the SVD of $A$. This approach is interesting because it is similar to the quadratic form approach given above. In this viewpoint, the SVD is for bilinear forms and the eigendecomposition is for quadratic forms. In both cases we want to know which inputs produce the largest outputs.