Question about Axler's proof that every linear operator has an eigenvalue
I am puzzled by Sheldon Axler's proof that every linear operator on a finite dimensional complex vector space has an eigenvalue (theorem 5.10 in "Linear Algebra Done Right"). In particular, it's his maneuver in the last set of displayed equations where he substitutes the linear operator $T$ for the complex variable $z$. See below.
What principle allows Axler to make this substitution of linear operator for complex variable and still claim that the RHS equals the LHS, and that the LHS goes from being the number zero (in the preceding set of equations) to the zero vector (in the final set of equations)?
As a possible answer to my own question, on the preceding page he offers these remarks:
Is linearity of the map from "polynomial over complex variable" to "polynomial over T" the property that makes this work? If so, I don't feel very enlightened. Any further insight here is greatly appreciated, especially since Axler makes a side comment that this proof is much clearer than the usual proof via the determinant.
The essential point here is not just linearity, but also compatibility with multiplication.
This means the following. When $$p(X) = \sum_n a_n X^n, \qquad q(X) = \sum_n b_n X^n$$ we define multiplication of polynomials in the usual way by $$(pq)(X) = \sum_n c_n X^n, \quad \text{ where } c_n = \sum_{i + j = n} a_i b_j.$$
Then not only do we have $$(pq)(z) = p(z)q(z)$$ for any $z \in \mathbf{C}$, but also $$(pq)(T) = p(T)q(T)$$ for any vector space endomorphism $T$.
By induction, this can be extended to products of an arbitrary finite number of polynomials.
Thus the equality of polynomials $$a_0 + a_1 X + \ldots + a_n X^n = c(X - \lambda_1) \cdots (X - \lambda_n)$$ entails both $$a_0 + a_1 z + \ldots + a_n z^n = c(z - \lambda_1) \cdots (z - \lambda_n)$$ and $$a_0 I+ a_1 T + \ldots + a_n T^n = c(T - \lambda_1 I) \cdots (T - \lambda_n I)$$ In abstract terms, for any $\mathbf{C}$-algebra $A$ and an element $a \in A$, the evaluation map $\operatorname{ev}_a \colon \mathbf{C}[X] \to A, \ p \mapsto p(a)$ is a morphism of $\mathbf{C}$-algebras. This holds in particular both for $A = \mathbf{C}$ and for $A = L(V)$.
You are right that there is more going on here that just linearity. The principle used here is that for any linear operator $T$ on a space $V$ over a field $K$, the map $\eta_T:\def\End{\operatorname{End}}K[X]\to\End(E)$ given by $\sum_ic_iX^i\mapsto\sum_ic_iT^i$ (substitution of the operator $T$ for the indeterminate $X$, or "evaluation in $T$") is a morphism of rings: not only is it $K$-linear, but it also respects multiplication: $\eta_T(PQ)=\eta_T(P)\circ\eta_T(Q)$. To check the validity of this is a fairly formal matter: the way the product $PQ$ of polynomials is computed just uses rearranging/combining of powers of $X$ and scalars, and the same operations are valid when $X$ is systematically replaced by$~T$. Note that one particular instance is the commutation of scalars and powers: $aX^ibY^j=abX^iY^j=abX^{i+j}$, and the corresponding equality $aT^ibT^j=abT^{i+j}$ is valid because $T$ commutes with scalar multiplication, which is because it is a linear operator. Note that if one would try to do the same thing for a non linear operator in place of $T$, then the resulting map $K[X]\to E^E$ (we no longer get an endomorphism as result, but still a map $E\to E$, and the set of all such maps is still a $K$ vector space) would still be $K$-linear, but would fail to respect multiplication.
By the way, I find that one can do a bit better than the cited proof (which is the centrepiece of Axler's somewhat dubious "down with determinants" claim). Note that if one uses the first linear dependency that arises among the vectors $v,Tv,T^2v,\ldots$, say $0=c_0v+c_1Tv+\cdots+c_dT^d$, then $c_d\neq0$ by minimality and after dividing everything by it one may assume $c_d=1$. This means that $p[T](v)=0$ where $p=X^d+c_{d-1}X^{d-1}+\cdots+c_1X+c_0$, and no monic polynomial in$~T$ of lower degree vanishes when applied to$~v$ (because by the minimality assumption $v,Tv,\ldots,T^{d-1}v$ are linearly independent). The subspace $V'=\ker(p[T])$ contains $v$ so that it is nonzero, and is closed under application of$~T$; it will be all of$~V$ if $d=n$ (as often happens), but in any case we can henceforth restrict our attention to $V'$, and call $T'$ the restriction of $T$ to $V'$. Then $p$ is the minimal polynomial of$~T'$: one has $p[T']=0$, and it is the minimal degree monic polynomial with this property. (This minimal polynomial property is all that will be used.)
In this situation not only $p$ has some root that is an eigenvalue of $T'$ and therefore of $T$ (assuming one works over the complex numbers), but every root of$~p$ is an eigenvalue (this is true over arbitrary fields, though of course only over algebraically closed fields one knows the existence of such roots). Here's why: if $\lambda$ is a root of$~p$ one can factor by $X-\lambda$ and write $p=(X-\lambda)q$, where $\deg q=\deg p-1$ so that for degree reasons mentioned above $q[T']\neq0$. Now $0=p[T']=(T'-\lambda I)\circ q[T']$, and every vector in the (nonzero) image of $q[T']$ is an eigenvector of$~T'$ for the eigenvalue$~\lambda$, in other words an eigenvector of$~T$ for$~\lambda$ lying in the subspace$~V'$.
You're on the right track. The map from "polynomial over complex variable" to "polynomial over T" is linear, but also it preserves multiplication. If $h(x) = f(x) g(x)$ as polynomials over x, then $h(T) = f(T) g(T)$ as polynomials over T. So if you factor in terms of $x$ that factors in terms of $T$.