Is the proof of this lemma really necessary?

Solution 1:

The reason we need the lemma is that from $P(t)=b(t)(A-tI)$ one cannot directly conclude that $P(A)=b(A)(A-AI)$.

If $R$ is a commutative ring, then there is a natural map $R[t]\to R^R$ which is a ring homomorphism (we endow $R^R$ with the pointwise ring structure: $(f+g)(r) = f(r)+g(r)$, and $fg(r) = f(r)g(r)$ for every $r\in R$). If $p(t)=q(t)s(t)$, then for every $r\in R$ you have that $p(r)=q(r)s(r)$.

But this doesn't work if $R$ is not commutative. For example, taking $p(t) = at$, $q(t) = t$ and $s(t)=a$, you have $p(t)=q(t)s(t)$ in $R[t]$ (since $t$ is central in $R[t]$ even when $R$ is not commutative), but $p(r) = ar$ while $q(r)s(r) = ra$. So you get $p(r)=q(r)s(r)$ if and only if $a$ and $r$ commute. Thus, while you can certainly define a map $\psi\colon R[t]\to R^R$ by $$\psi(a_0+a_1t+\cdots+a_nt^n)(r) = a_0 + a_1r + \cdots + a_nr^n,$$ this map is not a ring homomorphism when the ring is not commutative. This is the situation we have here, where the ring $R$ is the ring $n\times n$ matrices over $\mathbb{K}$, which is not commutative when $n\gt 1$. In particular, from $P(t) = B(t)(A-tI)$ one cannot simply conclude that $P(A)=B(A)(A-AI)$. This implicitly assumes that your map $M_n(\mathbb{K})[t]\to M_n(\mathbb{K})^{M_n(\mathbb{K})}$ is multiplicative, which it is not in this case.

If your $A$ happens to be central in $M_n(\mathbb{K})$, then it is true that the induced map $M_n(\mathbb{K})[t]\to M_n(\mathbb{K})$ is a homomorphism. But then you would be assuming that your $A$ is a scalar multiple of the identity. It would also be true if the coefficients of the polynomial $b(t)$ centralize $A$, but you are not assuming that. So you do need to prove that in this case you have $P(A)=b(A)(A-AI)$, since it does not follow from the general set-up (the way it would in a commutative setting).

P.S. In fact, this is the subtle point where the proof that a polynomial over a field of degree $n$ has at most $n$ roots breaks down for skew fields/division rings. If $K$ is a division ring, then the division algorithm holds for polynomials with coefficients over $K$, so one can show that for every $p(t)\in K[t]$ and $a(t)\in K[t]$, $a(t)\neq 0$, there exist unique $q(t)$ and $r(t)$ such that $p(t)=q(t)a(t) + r(t)$ and $r(t)=0$ or $\deg(r)\lt \deg(a)$. From this, we can deduce that for every polynomial $p(t)$ and for every $a\in K$, we can write $p(t) = q(t)(t-a) + r$, where $r\in K$. But the proof of the Remainder and Factor Theorems no longer goes through, because we cannot go from $p(t)=q(t)(t-a)+r$ to $p(a)=q(a)(a-a)+r$; and you cannot get the recursion argument to work, because from $p(t)=q(t)(t-a)$, and $p(b)=0$ with $b\neq a$, you cannot deduce that $q(b)=0$. For instance, over the real quaternions, we have $p(t)=t^2+1=(t+i)(t-i)$, but $p(j)=j^2+1\neq 2k = ij-ji = (j+i)(j-i)$. I remember when I first learned the corresponding theorems for polynomial rings, the professor challenging us to identify all the field axioms used in the proofs of the Remainder and Factor Theorem; none of us spotted the use of commutativity in the evaluation map.

Solution 2:

It's not clear to me what the exact problem is here, but there is a common pitfall concerning polynomials with matrix coefficients.

Given a polynomial $P(t)=\sum_i B_i t^i$ we can certainly substitute a matrix $A$ for $t$ to get $P(A)=\sum_i B_i A^i$. But there is a catch when we multiply polynomials. If $Q(t)$ is another polynomial with matrix coefficients, then we may have $$(PQ)(A)\ne P(A)Q(A).$$ A sufficient condition for $(PQ)(A)=P(A)Q(A)$ to hold is that $A$ commutes with the coefficients of $Q$. But when it doesn't then in general $(PQ)(A)\ne P(A)Q(A)$.