Connection between algebraic multiplicity and dimension of generalized eigenspace
Solution 1:
Don't use the Cayley-Hamilton theorem; it is less elementary than what you need. And in any case in Axler's book it (8.20) follows results to the effect you are asking about (8.10, 8.18). In fact Axler defines the (algebraic) multiplicity of $\lambda$ as $\dim(G_\lambda)$, and then goes on to define the characteristic polynomial to be the product over eigenvalues$~\lambda$ of $(X-\lambda)^{\dim(G_\lambda)}$ (which is a crazy thing to do, born of irrational fear of determinants, but) which makes the question you ask void of content in the context of that book.
I suppose you know that given a direct sum decomposition into invariant subspaces, the characteristic polynomial of $T$ is the product of those of its restrictions to those subspaces. I will also assume you know the characteristic polynomial of the restriction of $T$ to $G_\lambda$ is $(X-\lambda)^{\dim(G_\lambda)}$. Both things are quite obvious if you define the characteristic polynomial using determinants (for the second part use that the restriction has a triangular matrix on an appropriate basis). Now you will be done if you can show that $G_\lambda$ is a factor in a direct sum decomposition into two invariant subspaces, where (the restriction of $T$ to) the other factor does not have $\lambda$ as an eigenvalue.
There are two approaches to proving that fact. The one related to the primary decomposition theorem is to write the minimal polynomial~$\mu$ of$~T$ (or any polynomial annihilating $T$) as product $\mu=(X-\lambda)^dQ$ of a power of $X-\lambda$ and a factor$~Q$ relatively prime to it; then using Bézout coefficients $B,C$ of these two factors (so $1=B(X-\lambda)^d+CQ$) (one can find certain polynomials of $T$ (namely $(CQ)[X:=T]$ and $\def\Id{\mathrm{id}}B[X:=T](T-\lambda\Id)^d$) that give projections onto the kernels of $(T-\lambda\Id)^d$ respectively $Q[X:=T]$, and which kernels therefore form a direct sum decomposition. The kernel associated the factor $(X-\lambda)^d$ is in fact $G_\lambda$ (that approach does not even explicitly depend on the space being finite dimensional, though having an annihilating polynomial in the first place does depend on that).
But there is more elementary: if $G_\lambda$ is the kernel of $(T-\lambda\Id)^d$ with $d=\dim(G_\lambda)$, then the image of $(T-\lambda\Id)^d$ provides an invariant complementary factor. The intersection has dimension zero, since one would otherwise have vectors that are not annihilated by $(T-\lambda\Id)^d$, but which are annihilated by a higher power of $T-\lambda\Id$, which contradicts what you ought to know of generalised eigenspaces. But then the two subspaces are complementary by the rank-nullity theorem, and form a direct sum decomposition.