Is the converse of Cayley-Hamilton Theorem true?

The question is motivated from the following problem:

Let $I\neq A\neq -I$, where $I$ is the identity matrix and $A$ is a real $2\times 2$ matrix. If $A=A^{-1}$, then the trace of $A$ is
$$ (A) 2 \quad(B)1 \quad(C)0 \quad (D)-1 \quad (E)-2$$

Since $A=A^{-1}$, $A^2=I$. If the converse of Cayley-Hamilton Theorem is true, then $\lambda^2=1$ and thus $\lambda=\pm1$. And then $\rm{trace}(A)=1+(-1)=0$.

Here are my questions:

  1. Is $C$ the answer to the quoted problem?
  2. Is the converse of Cayley-Hamilton Theorem, i.e.,"for the square real matrix $A$, if $p(A)=0$, then $p(\lambda)$ is the characteristic polynomial of the matrix $A$" true? If it is not, then what's the right method to solve the problem above?

No, the converse of Cayley-Hamilton is not true for $n\times n$ matrices with $n\gt 1$; in particular, it fails for $2\times 2$ matrices.

For a simple counterexample, notice that if $p(A)=0$, then for every multiple $q(x)$ of $p(x)$ you also have $q(A)=0$; so you would want to amend the converse to say "if $p(A)=0$, then $p(a)$ is a multiple of the characteristic polynomial of $A$". But even that amended version is false

However, the only failure in the $2\times 2$ matrix case are the scalar multiples of the identity. If $A=\lambda I$, then $p(x)=x-\lambda$ satisfies $p(A)=0$, but the characteristic polynomial is $(x-\lambda)^2$, not $p(x)$.

For bigger matrices, there are other situations where even this weakened converse fails.

The concept that captures the "converse" of Cayley-Hamilton is the minimal polynommial of the matrix, which is the monic polynomial $p(x)$ of smallest degree such that $p(A)=0$. It is then easy to show (using the division algorithm) that if $q(x)$ is any polynomial for which $q(A)=0$, then $p(x)|q(x)$. (Be careful to justify that if $m(x)=r(x)s(x)$, then $m(A)=r(A)s(A)$; this is not immediate because matrix multiplication is not in general commutative!) So we have:

Theorem. Let $A$ be an $n\times n$ matrix over $\mathbf{F}$, and let $\mu(x)$ be the minimal polynomial of $A$. If $p(x)\in \mathbf{F}[x]$ is any polynomial such that $p(A)=0$, then $\mu(x)$ divides $p(x)$.

The Cayley-Hamilton Theorem shows that the characteristic polynomial is always a multiple of the minimal polynomial. In fact, one can prove that every irreducible factor of the characteristic polynomial must divide the minimal polynomial. Thus, for a $2\times 2$ matrix, if the characteristic polynomial splits and has distinct roots, then the characteristic and minimal polynomial are equal. If the characteristic polynomial is irreducible quadratic and we are working over $\mathbb{R}$, then again the minimal and characteristic polynomials are equal. But if the characteristic polynomial is of the form $(x-a)^2$, then the minimal polynomial is either $(x-a)$ (when the matrix equals $aI$), or $(x-a)^2$ (when the matrix is not diagonalizable).

As for solving this problem: if $\lambda$ is an eigenvalue of $A$, and $A$ is invertible, then $\lambda\neq 0$, and $\frac{1}{\lambda}$ is an eigenvalue of $A^{-1}$: for if $\mathbf{x}\neq\mathbf{0}$ is such that $A\mathbf{x}=\lambda\mathbf{x}$, then multiplying both sides by $A^{-1}$ we get $\mathbf{x} = A^{-1}(\lambda \mathbf{x}) = \lambda A^{-1}\mathbf{x}$. Dividing through by $\lambda$ shows $\mathbf{x}$ is an eigenvector of $A^{-1}$ corresponding to $\frac{1}{\lambda}$.

Since $A=A^{-1}$, that means that if $\lambda_1,\lambda_2$ are the eigenvalues of $A$, then $\lambda_1 = \frac{1}{\lambda_1}$ and $\lambda_2=\frac{1}{\lambda_1}$; thus, each eigenvalue is either $1$ or $-1$.

If the matrix is diagonalizable, then we cannot have both equal to $1$ (since then $A=I$), and they cannot both be equal to $-1$ (since $A\neq -I$), so one eigenvalue is $1$ and the other is $-1$. Since the trace of a square matrix equals the sum of its eigenvalues, the sum of the eigenvalues is $0$.

Why is $A$ diagonalizable? If it has two distinct eigenvalues, $1$ and $-1$, then there is nothing to do; we know it is diagonalizable. If it has a repeated eigenvalue, say $1$, but $A-I$ is not the zero matrix, pick $\mathbf{x}\in \mathbb{R}^2$ such that $A\mathbf{x}\neq \mathbf{x}$; then $$\mathbf{0}=(A-I)^2\mathbf{x} = (A^2-2A + I)\mathbf{x} = (2I-2A)\mathbf{x}$$ by the Cayley Hamilton Theorem. But that means that $2(A-I)\mathbf{x}=\mathbf{0}$, contradicting our choice of $\mathbf{x}$. Thus, $A-I=0$, so $A=I$ and $A$ is diagonalizable. A similar argument shows that if $-1$ is the only eigenvalue, then $A+I=0$. . (Hiding behind that paragraph is the fact that if the minimal polynomial is squarefree and splits, then the matrix is diagonalizable; since $p(x)=x^2-1=(x-1)(x+1)$ is a multiple of the minimal polynomial, the matrix must be diagonalizable).

So this completes the proof that the trace must be $0$, given that $A\neq I$ and $A\neq -I$.


  1. If $A^2 = 1$ then the eigenvalues of $A$ satisfy $\lambda^2 = 1$, so they are either $+1$ or $-1$. As they cannot both be $+1$ or $-1$, we must have one each, and their sum (the trace) is $0$.

  2. If $p(A) = 0$ then $p(A)$ is divisible by the minimal polynomial of $A$. As an extreme example, take $A=0$. Then $p(A) = 0$ for lots of polynomials, but the characteristic polynomial is $x \mapsto x^{\dim A}$.