Coefficients of the characteristic polynomial

Let $A$ be an $n\times n$ matrix. Then its characteristic polynomial is

$$\phi_A(x) = \det(xI - A) = x^n - (\operatorname{tr} A)x^{n-1} + \cdots + (-1)^{n-1}(\operatorname{tr}(\operatorname{adj}A))x + (-1)^n\det A,$$

where $\operatorname{adj}A$ denotes the adjugate matrix of $A$.

My question: Why are the coefficients of $\phi_A$ those shown above?

I can see why $(-1)^n\det A$ is the constant term, since $$\det(A) = \phi_A(0) = (0-\mu_1)\cdots(0-\mu_n) = (-1)^n\mu_1\cdots\mu_n,$$

where $\mu_i$ are the roots ($i=1,\dots,n$). I also see why the coefficient of $x^{n-1}$ is minus the trace, since by Laplace expansion (along the first row):

\begin{align*} &\det(xI-A)\\[10pt] &= \begin{vmatrix} x-a_{11} & -a_{12} & \cdots & -a_{1n}\\ -a_{21} & x-a_{22} & \cdots & -a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ -a_{n1} & -a_{n2} & \cdots & x-a_{nn} \end{vmatrix}\\[10pt] &= (x-a_{11})\begin{vmatrix} x-a_{22} & \cdots & -a_{2n}\\ \vdots & \ddots & \vdots\\ -a_{n2} & \cdots & x-a_{nn} \end{vmatrix} + \underbrace{a_{12} \begin{vmatrix} -a_{21} & \cdots & -a_{2n}\\ \vdots & \ddots & \vdots\\ -a_{n1} & \cdots & x-a_{nn} \end{vmatrix} + \cdots}_{\text{at most polynomial of degree $n-2$}}\\[10pt] &=(x-a_{11})\bigg((x-a_{22})\begin{vmatrix} x-a_{33} & \cdots & -a_{3n}\\ \vdots & \ddots & \vdots\\ -a_{n3} & \cdots & x-a_{nn} \end{vmatrix} +\underbrace{\underbrace{a_{23}\begin{vmatrix} -a_{32} & \cdots & -a_{3n}\\ \vdots & \ddots & \vdots\\ -a_{n2} & \cdots & x-a_{nn} \end{vmatrix}+\cdots}_{\text{at most polynomial of degree $n-3$}} \bigg) + \cdots}_{\text{at most polynomial of degree $n-2$}}\\ &\kern10pt\vdots\\[10pt] &=(x-a_{11})(x-a_{22})\cdots(x-a_{nn}) + \underbrace{\cdots}_{\text{at most polynomial of degree $n-2$}}\\[10pt] &= x^n - (a_{11}+a_{22}+\cdots + a_{nn})x^{n-1} + \cdots\\[10pt] &= x^n - (\operatorname{tr}A)x^{n-1} + \cdots \end{align*} (Let me know if there is a more elegant way to obtain this).

But what I cannot seem to prove is that the coefficient of $x$ is $ (-1)^{n-1}\operatorname{tr}(\operatorname{adj}A)$. Attempting to work with the Laplace expansion (as I did to obtain the trace above) is leading me nowhere.


Solution 1:

Suppose first $B$ is an $n \times n$ matrix with $\det(B) = 1$. Then $\text{adj}(B) = B^{-1}$, and

$$ \phi_{B^{-1}}(x) = \det(x I - B^{-1}) = \det(x B - I) = (-x)^n \det(x^{-1} I - B) = (-x)^n \phi_B(1/x)$$

The coefficient of $x^{1}$ in $\phi_B(x)$ is the coefficient of $x^{-1}$ in $\phi_B(1/x)$, i.e. $(-1)^n$ times the coefficient of $x^{n-1}$ in $(-x)^n \phi_B(1/x)$, but you know the coefficient of $x^{n-1}$ in $\phi_{B^{-1}}(x)$ is $\text{tr}(B^{-1})$. That is, the coefficient of $x^1$ in $\phi_B(x)$ is $(-1)^n \text{tr}(B^{-1}) = (-1)^n \text{tr}(\text{adj}\; B)$.

Now let $A = t B$, where $t \ne 0$. Then $\phi_A(x) = \det(x I - t B) = t^n \det((x/t)I - A) = t^n \phi_B(x/t)$. The coefficient of $x^1$ in $\phi_A(x)$ is then $t^{n-1}$ times the coefficient of $x^1$ in $\phi_B(x)$. But also $\text{adj}\; A = t^{n-1} \text{adj}\; B$. So we again obtain that the coefficient of $x^1$ in $\phi_A(x)$ is $(-1)^n \text{tr}(\text{adj}\; A)$.

Every nonsingular matrix $A = \det(A)^{1/n} B$ where $\det(B) = 1$, so the formula for the coefficient holds for every nonsingular matrix. (Note that in the case where $\det(A)$ is negative and $n$ is even, we must use complex numbers here, but that's not a problem)

Now the coefficient of $x^1$ in $\phi_A(x)$ and $(-1)^n \text{tr}(\text{adj}\; A)$ are both polynomials in the coefficients of $A$.
Thus the equation must hold for all $n \times n$ matrices, not just the nonsingular ones.

Solution 2:

This can easily be derived from Corollary 2 in https://math.stackexchange.com/a/3069061/ (applied to $-A$ instead of $A$) by setting $r = 1$, $r = n-1$ or $r = n$. If you follow the reference to my notes, you'll see that I basically make the "expand everything and extract the coefficients of the appropriate powers of $x$" argument that you already did for the $x^{n-2}$-coefficient and that @J_P sketched for the other coefficients; all I am doing differently is actually formalizing it.

For another proof, see my note The trace Cayley-Hamilton theorem (temporary mirror while the website just linked is offline), specifically:

  • Corollary 2.4 (b) for the $x^0$-coefficient being $\left(-1\right)^n \det A$;

  • Theorem 5.32 for the $x^1$-coefficient being $\left(-1\right)^{n-1} \operatorname{Tr}\left(\operatorname{adj}A\right)$;

  • Corollary 3.22 for the $x^{n-1}$-coefficient being $- \operatorname{Tr} A$.

Note that my goal in that note (at least the original goal; feature creep has borne all of Section 5 and parts of the preceding sections) is not to find the coefficients of the characteristic polynomial, so I am not taking any sort of beeline. There isn't much to say in favor of my proof other than that it avoids the handwaving that typically appears in the standard (term-by-term expansion) argument.

Solution 3:

I define $\phi_A(x)=\det(A-\lambda I)$. You get the coefficient at $\lambda^{n-k}$ in the following way: pick $k$ diagonal entries $a_{i_1i_1},...,a_{i_ki_k}$. These define a $k\times k$ minor of $A$ that you get by removing every row and column that does not contain one of the chosen diagonal elements. Calculate the determinant of this minor and sum over all $n\choose k$ selections of the $k$ diagonal entries. Up to sign (which is just alternating) that is your $\lambda^{n-k}$ coefficient.

I have to say that I don't know a precise formal proof (EDIT: go to the bottom of the answer, I've now added an attempt at formalizing what follows), I like to view this in a more algorithmic fashion. Here's how I convince myself: $$ \begin{vmatrix} a_{11}-\lambda & a_{12} & ... & a_{1n} \\ a_{21} & a_{22}-\lambda & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ...& a_{nn}-\lambda \end{vmatrix}=\\ =\begin{vmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22}-\lambda & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ...& a_{nn}-\lambda \end{vmatrix}-\lambda\begin{vmatrix} a_{22}-\lambda & ... & a_{2n} \\ ... & ... & ... \\ a_{n2} & ...& a_{nn}-\lambda \end{vmatrix} $$ Now you see the problem has split into two parts. Deal with each one seperately: in the first one, move on to the next column and again split the determinant up as $$ \begin{vmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22}-\lambda & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ...& a_{nn}-\lambda \end{vmatrix}=\begin{vmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ...& a_{nn}-\lambda \end{vmatrix}-\lambda\begin{vmatrix} a_{11} & ... & a_{2n} \\ ... & ... & ... \\ a_{n2} & ...& a_{nn}-\lambda \end{vmatrix} $$ Do the same for the second one. As you keep doing this, a huge number of determinants will crop up ($2^n$ in total - which is just the sum of the binomial coefficients $n\choose k$, already suggestive). You can view this as a decision tree: at each step, you say whether you want to keep the $i$-th column or not. After you pick, you won't touch the previous columns again. So doing this will give you square determinants with certain columns missing. If you pick some columns (equivalently, some diagonal elements) to remove, say "NO" to every decision where you remove one of the desired columns and "YES" otherwise. This defines a path down the binary tree which terminantes when no more $\lambda$'s are left. Since every selection of $k$ diagonal elements corresponds to exactly one path down the binary tree with $n-k$ "YES"'s, you will get $(-1)^{n-k}\lambda^{n-k}$ in front of each such determinant and so when you sum it all up, you get precisely the sum of of all diagonal-centered $k\times k$ determinants.

In particular, you get the linear term by removing only $1$ column and row and summing over all possibilities. Since these are determinants of the principal minors which feature in the definition of the adjugate matrix, what you calculate is just the sum of the diagonal elements of the adjugate - that is, its trace.

If you don't believe all this mumbo-jumbo (which is perfectly reasonable), I strongly encourage you to try a $4\times 4$ case since that is only $16$ determinants in total, or, if you have a lot of paper, the $5\times 5$ case. That might help you get a "feel" for the pattern.

I also think this exact argument could be made more rigorous if you threw in some arguments like "at the $i$-th step, determinants of a certain size will appear at a certain power of $\lambda$" and try doing this by induction, but I never actually went through with this so it's just a hunch at this point.


So I tried my hand at formalizing this a bit and here's my attempt. Let $D(b_1,b_2,...,b_n)$ denote the various determinants that will appear, where $b_i\in\{0,1,2\}$. $b_i=0$ means that you remove the $i$-th row and column entirely, $b_i=1$ means that you only remove the $\lambda$ at $a_{ii}$ and $b_i=2$ means "leave the $i$-th diagonal element and its row and column as is". For example: $$ D(2,2,2)=\begin{vmatrix} a_{11}-\lambda & a_{12} & a_{13} \\ a_{21} & a_{22}-\lambda & a_{23} \\ a_{31} & a_{32} & a_{33}-\lambda \end{vmatrix}\quad D(2,1,1)=\begin{vmatrix} a_{11}-\lambda & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix}\\ D(2,1,0)=\begin{vmatrix} a_{11}-\lambda & a_{12}\\ a_{21} & a_{22} \end{vmatrix} $$ We have the reduction formula: $$D(b_1,...,2,...,b_n)=D(b_1,...,1,...,b_n)-\lambda D(b_1,...,0,...,b_n)$$ We start with $D(2_1,2_2,...,2_n)$. Write $\bf{b}$ for the vector of $b_i$ and let $\Vert {\bf b}\Vert$ be the number of $1$'s in $\bf{b}$. If we proceed by induction, we can say that $$D(0,2_1,...,2_{n-1})=\sum_{\dim{\bf b}=n-1\\\quad \,2\not\in{\bf b}}D(0,{\bf b})(-\lambda)^{n-1-\Vert{\bf b}\Vert}$$ Since the $0$ at the front plays no role and the reduction formula is the same regardless of what the first $b_i$ is, we must have that also: $$D(1,2_1,...,2_{n-1})=\sum_{\dim{\bf b}=n-1\\\quad \,2\not\in{\bf b}}D(1,{\bf b})(-\lambda)^{n-1-\Vert{\bf b}\Vert}$$ This can be rigorously demonstrated by observing that if we apply the reduction formula to $D(0,{\bf b})$ on some index, we get the same as if we would for $D(1\text{ or }2,{\bf b})$ but for the first index. By applying reduction inductively on the indices, we eliminate all $2$'s but since at each step the formula is the same except that we change the first index, after all's said and done, we're left with identical formulas but for the first index.
Anyway, we have: $$ D(2_1,2_2,...,2_n)=D(1,2,...,2)-\lambda D(0,2,...,2)=\\ =\sum_{\dim{\bf b}=n-1\\\quad \,2\not\in{\bf b}}\left(D(1,{\bf b})(-\lambda)^{n-1-\Vert{\bf b}\Vert}+D(0,{\bf b})(-\lambda)^{n-\Vert{\bf b}\Vert}\right)=\sum_{\dim{\bf b}=n\\\,\,\,\,\, 2\not\in{\bf b}}D({\bf b})(-\lambda)^{n-\Vert{\bf b}\Vert} $$ This establishes the inductive step and completes the proof.


Due to the helpful suggestion of @amd in the comments, here's an approach with Leibniz's formula. Here, $N=\{1,...,n\}$. \begin{align} &\det(A-\lambda I)=\sum_{\sigma\in\mathrm{Sym}(n)}\mathrm{sgn}(\sigma)\prod_{i=1}^n(a_{i\sigma(i)}-\lambda\delta_{i\sigma(i)})=\\ &=\sum_{\sigma\in\mathrm{Sym}(n)}\mathrm{sgn}(\sigma)\sum_{A\subset N}\prod_{i\in A}a_{i\sigma(i)}\prod_{j\in N-A}(-\lambda)\delta_{j\sigma(j)}=\\ &=\sum_{A\subset N}(-\lambda)^{n-\Vert A\Vert}\sum_{\sigma\in\mathrm{Sym}(n)}\mathrm{sgn}(\sigma)\prod_{i\in A}a_{i\sigma(i)}\prod_{j\in N-A}\delta_{j\sigma(j)} \end{align} In the last expression we have that $\prod_{j\in N-A}\delta_{j\sigma(j)}$ which is $0$ unless $N-A$ contains only fixed points of $\sigma$ - in other words, we need only sum over $\sigma$ whose fixed points are a superset of $N-A$. But these new permutations form a group which is essentially just $\mathrm{Sym}(A)$, the permutations of $A$. The sign of a permutation of length $m$ is $(-1)^k$ where $k=m-\text{(number of disjoint cycles that make up $\sigma$)}$ - when we drop the stationary points from $\sigma$, $m$ decreases but the number of disjoint cycles decreases by the exact same amount since stationary points correspond to trivial cycles. Hence the sign remains the same and we have: $$ \det(A-\lambda I)=\sum_{A\subset N}(-\lambda)^{n-\Vert A\Vert}\sum_{\sigma\in\mathrm{Sym}(A)}\mathrm{sgn}(\sigma)\prod_{i\in A}a_{i\sigma(i)} $$ But the sums over $\sigma\in\mathrm{Sym}(A)$ are now just determinants of the principal minors and we are done.

Solution 4:

The above answers are excellent and are a bit more general than what I had in mind, but if you only wanted the coefficient of $x$ in $\phi_A(x)$, here is a reasonably straightforward way to get it: this coefficient is just $\phi_A'(0)$, so if we write $A(x)=xI-A$, we are looking for the derivative of the composite function $\det(A(x))$. For any square matrix $M$, $\det(M)=\det(\{M_{i,j}\}_{i,j=1}^n)$ is some polynomial function of the entries, and by the cofactor expansion, we can expand along any row $i$ to find that $$\det(M)=\sum_{j=1}^n M_{i,j}C_{i,j},$$ where $C_{i,j}$ is the $(i,j)$ cofactor of $M$. Note that this is not a function of $M_{i,j}$. In particular, for any indices $i,k$, the product rule gives $$\frac{\partial \det(M)}{\partial M_{i,k}}=\sum_{j=1}^n\frac{\partial M_{i,j}}{\partial M_{i,k}}C_{i,j}+M_{i,j}\frac{\partial C_{i,j}}{M_{i,k}}.$$ But unless $j=k$, the term $\partial M_{i,j}/\partial M_{i,k}=0$ (and is $1$ when $j=k$) and similarly $\partial C_{i,j}/\partial M_{i,k}=0$ as $C_{i,j}$ is obtained by taking a determinant that excludes the $i$th row. Therefore, $$\frac{\partial \det(M)}{\partial M_{i,k}}=C_{i,k}.$$

In our case, the chain rule gives $$ \phi_A'(0)=\sum_{i,j=1}^n \frac{\partial \det(A(0))}{\partial A_{i,j}}\frac{\partial A_{i,j}}{\partial x}(0)=\sum_{i=1}^n C_{i,i},$$ where $C_{i,i}$ is the $(i,i)$-cofactor of $A(0)=-A$. All the off-diagonal terms are zero as $\partial A_{i,j}/\partial x=1$ if $i=j$ and is zero otherwise as only diagonal entries are affected by adding $xI$. This is exactly $(-1)^{n-1}\text{tr}(\text{adj}(A))$.