Why does calculating matrix inverses, roots, etc. using the spectrum of a matrix work?

Two classic books where you can find all the details about this stuff:

  1. Gantmacher, Matrix theory, Chelsea.
  2. Lancaster-Titsmenesky, The theory of matrices, Academic Press.

Actually, for "hand" calculations, this works through the Jordan canonical form: you find the Jordan canonical form of your matrix, together with the change of basis matrix

$$ A = SJS^{-1} \ . $$

Then you prove that, for any polynomial $p(t)$, you have

$$ p(A) = S p(J) S^{-1} \ . $$

Hence,

$$ f(A) = S f(J) S^{-1} $$

and you only need to compute $p(J)$ for Jordan matrices.

Which you do as follows: first, if you have a diagonal block matrix

$$ J = \begin{pmatrix} J_1 & 0 & \dots & 0 \\ 0 & J_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & J_r \end{pmatrix} $$

you can easily prove that

$$ p(J) = \begin{pmatrix} p(J_1) & 0 & \dots & 0 \\ 0 & p(J_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & p(J_r) \end{pmatrix} $$

So again, on one hand,

$$ f(J) = \begin{pmatrix} f(J_1) & 0 & \dots & 0 \\ 0 & f(J_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & f(J_r) \end{pmatrix} $$

and, on the other hand, you just need to know $p(J)$ when $J$ is a Jordan block. If :

$$ J = \begin{pmatrix} \lambda & 0 & 0 & \dots & 0 & 0 \\ 1 & \lambda & 0 & \dots & 0 & 0 \\ 0 & 1 & \lambda & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & \lambda & 0 \\ 0 & 0 & \dots & 0 & 1 & \lambda \end{pmatrix} $$

is an $r\times r$ Jordan block, then

$$ p(J) = \begin{pmatrix} p(\lambda ) & 0 & 0 & \dots & 0 & 0 \\ p'(\lambda)/ 1! & p(\lambda) & 0 & \dots & 0 & 0 \\ p''(\lambda)/ 2! & p'(\lambda)/ 1! & p(\lambda) & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ p^{(r-2)}(\lambda)/(r-2)! &p^{(r-3)}(\lambda)/(r-3)! & \dots & p'(\lambda)/ 1! & p(\lambda) & 0 \\ p^{(r-1)}(\lambda)/(r-1)! &p^{(r-2)}(\lambda)/(r-2)! & \dots & p''(\lambda)/2! & p'(\lambda)/ 1! & p(\lambda) \end{pmatrix} $$

Hence, again you have all in terms of $f$ in fact:

$$ f(J) = \begin{pmatrix} f(\lambda ) & 0 & 0 & \dots & 0 & 0 \\ f'(\lambda)/ 1! & f(\lambda) & 0 & \dots & 0 & 0 \\ f''(\lambda)/ 2! & f'(\lambda)/ 1! & f(\lambda) & \dots & 0 & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ f^{(r-2)}(\lambda)/(r-2)! &f^{(r-3)}(\lambda)/(r-3)! & \dots & f'(\lambda)/ 1! & f(\lambda) & 0 \\ f^{(r-1)}(\lambda)/(r-1)! &f^{(r-2)}(\lambda)/(r-2)! & \dots & f''(\lambda)/2! & f'(\lambda)/ 1! & f(\lambda) \end{pmatrix} $$

And, in this version of the story, you actually don't need to know your polynomial $p(t)$ for your function $f(t)$ and matrix $A$ -but it's not difficult to find it, anyway: it's called the Lagrange-Sylvester polynomial, which is some sort of mixture between the classic Lagrange interpolation polynomial and a Taylor series.

EDIT

Nevertheless, seems that I forgot to answer the most important question: "Why does this whole thing actually work?"

That is, why defining

$$ f(A) = p(A) $$

for some polynomial $p(t)$ that agrees with $f(t)$ on the spectrum of $A$ all this makes sense? That is, for what reason can we actually call $p(A)$ (the computable thing) the "value" of $f(t)$ on the matrix $A$?

Because of the following:

Theorem. (Gantmacher, chapter V, $\S 4$, theorem 2). If the function $f(t)$ can be expanded in a power series in the circle $\vert t - \lambda \vert < r$,

$$ f(t) = \sum_{n=0}^\infty a_n(t-\lambda)^n \ , $$

then this expansion remains valid when the scalar argument $t$ is replaced by a matrix $A$ whose eigenvalues lie within the circle of convergence.

That is, under the hypotheses of the theorem, you have

$$ f(A) = \sum_{n=0}^\infty a_n(A-\lambda I)^n \ , $$

where the $f(A)$ in the left hand-side means $p(A)$, the value of the Lagrange-Sylvester polynomial on $A$.

So, why not define $f(A)$ as this last power series (i.e., the Taylor series of $f(t)$)? Well, because then you would have to talk a long time about convergence issues of matrix series... And you would end, finally, at the same point: relying on the Jordan canonical form for actual computations. So, the Lagrange-Sylvester device allows you to get rid of convergence issues -if you're willing to accept $f(A) = p(A)$ as a sound definition.


When the matrix $A$ is not diagonalizable, working with the "spectrum" becomes technically involved, but even then via Cauchy's theorem of complex analysis you can make sense of $\sin(A)$, etc., in a single formula not involving any eigenvalues or diagonalization process: When all eigenvalues of $A$ are contained in the disk $D_R$ of radius $R$ around $0$ then $$\sin(A)={1\over2\pi i}\int_{\partial D_R}{\sin z\over z-A}\ dz\ .$$

Back to matrices: In the one-dimensional case an $A:\ {\mathbb R}\to{\mathbb R}$ is just a scaling $t\mapsto \lambda t$ with a certain factor $\lambda\in{\mathbb C}$, and the only meaning $f(A)$ could have then is that $f(A)$ scales by the factor $f(\lambda)$.

In the $n$-dimensional diagonalizable case the space $V={\mathbb R}^n$ where $A$ acts can be written as a direct sum $V=\oplus_{i=1}^n V_i$ of the one-dimensional eigenspaces $V_i:=\{x| Ax=\lambda_i x\}$. This means that $A$ acts as $n$ independent one-dimensional scalings $t\mapsto \lambda_i t$ in parallel. If you are given a function $f$ whose domain $\Omega\subset{\mathbb C}$ contains the $\lambda_i$, then $f(A)$ restricted to one of the $V_i$ is just the map $$f(A)\restriction V_i: \quad t\mapsto f(\lambda_i) t\ ,$$ where $t$ is a "local" coordinate in $V_i$, denoted by $t_i$ when we look at all of these at the same time.

Now, since all this makes sense when we look at $A$ as being a linear map $V\to V$, by general principles this also makes sense in the world of matrices. In particular: If $T$ is the matrix whose columns are the eigenvectors of $A$, i.e., the basis vectors of the $V_i$ above, and $D={\rm diag}(\lambda_1,\ldots,\lambda_n)$ is the diagonal matrix containing the eigenvalues, then $$f(A)=T\ {\rm diag}\bigl(f(\lambda_1),\ldots,f(\lambda_n)\bigr)\ T^{-1}\ .$$