Showing that the minimal polynomial of an $n \times n$ matrix has degree at most $n$ without using the Cayley-Hamilton Theorem

Let $v$ be a nonzero vector in $V=k^n$. Let $m$ be the least positive integer with $A^mv$ linearly dependent on $v$, $Av,\ldots, A^{m-1}v$. Then there is a monic $f$ of degree $m$ with $f(A)v=0$. Therefore $f(A)A^jv=A^jf(A)v=0$ for $0\le j\le m - 1$ so that $f(A)$ has nullity $\ge m$. Let $W=f(A)V$. The dimension of $W$ is $\le n-m$. Then $A$ acts on $W$ and inductively $g(A)W=0$ for some monic polynomial $g$ of degree $\le n-m$. Therefore $gf$ has degree $\le n$ and annihilates $V$.


The most elementary way to prove this is given in the other answer, and can be summarised as follows: (1) for every vector $v$ and for the minimal monic polynomial $P$ such that $P[T](v)=0$ one has $\deg(P)\leq\dim(\ker(P[T]))$ (since that kernel contains the cyclic submodule generated by $v$), and (2) since the remaining factor $Q=\mu/P$ of the minimal polynomial$~\mu$ must annihilate precisely the image$~W$ of $P[T]$, we are reduced to showing that $\deg(Q)\leq\dim(W)$, which induction on the dimension conveniently does for us.

Just for reference I would like to mention another another possible approach, by reducing the question to one in a simplified situation, where it becomes fairly obvious. Since the minimal polynomial$~\mu$ does not change if we extend scalars to a larger field (i.e., consider the matrix of $T$ as one over a larger field), we may assume that $\mu$ factors into factors of the form $(X-\lambda)^m$ that are mutually relatively prime. The space then factors (canonically) into a direct sum of the kernels of the corresponding endomorphisms $(T-\lambda I)^m$ (the generalised eigenspaces), and it suffices to prove the result for each summand separately (as the product of the minimal polynomials of the restrictions of $T$ to the summands gives the global minimal polynomial$~\mu$). Since by definition $T'-\lambda I$ is nilpotent for such a restriction$~T'$, we are reduced to showing:

For a nilpotent operator$~N$ on a finite dimensional vectors space$~V$, its order$~m$ of nilpotency does not exceed $\dim(V)$.

This is fairly obvious, for instance because the subspaces $\ker(N^i)$ for $i=0,1,\ldots,m$ form a strictly increasing chain in $V$ (the successive images by $N$ of a vector of maximal lifetime leave these subspaces one by one). More is in fact true: the sequence of natural numbers $\dim(\ker(N^{i+1})/\ker(N^i))$ for $i=0,1,2,\ldots$ is weakly decreasing, which implies it reaches $0$ for the first time when $i=m$. To prove this, show that $T$ induces an injective linear map $\ker(N^{i+2})/\ker(N^{i+1})\to\ker(N^{i+1})/\ker(N^i)$.