I think a very useful notion here is the idea of a "generalized eigenvector".

An eigenvector of a matrix $A$ is a vector $v$ with associated value $\lambda$ such that $$ (A-\lambda I)v=0 $$ A generalized eigenvector, on the other hand, is a vector $w$ with the same associated value such that $$ (A-\lambda I)^kw=0 $$ That is, $(A-\lambda I)$ is nilpotent on $w$. Or, in other words: $$ (A - \lambda I)^{k-1}w=v $$ For some eigenvector $v$ with the same associated value.


Now, let's see how this definition helps us with a non-diagonalizable matrix such as $$ A = \pmatrix{ 2 & 1\\ 0 & 2 } $$ For this matrix, we have $\lambda=2$ as a unique eigenvalue, and $v=\pmatrix{1\\0}$ as the associated eigenvector, which I will let you verify. $w=\pmatrix{0\\1}$ is our generalized eiegenvector. Notice that $$ (A - 2I) = \pmatrix{ 0 & 1\\ 0 & 0} $$ Is a nilpotent matrix of order $2$. Note that $(A - 2I)v=0$, and $(A- 2I)w=v$ so that $(A-2I)^2w=0$. But what does this mean for what the matrix $A$ does? The behavior of $v$ is fairly obvious, but with $w$ we have $$ Aw = \pmatrix{1\\2}=2w + v $$ So $w$ behaves kind of like an eigenvector, but not really. In general, a generalized eigenvector, when acted upon by $A$, gives another vector in the generalized eigenspace.


An important related notion is Jordan Normal Form. That is, while we can't always diagonalize a matrix by finding a basis of eigenvectors, we can always put the matrix into Jordan normal form by finding a basis of generalized eigenvectors/eigenspaces.

I hope that helps. I'd say that the most important thing to grasp from the idea of generalized eigenvectors is that every transformation can be related to the action of a nilpotent over some subspace.


Edit: The algebra I speak of here is not actually the Grassmann numbers at all -- they are $\mathbb{R}[X]/(X^n)$, whose generators don't satisfy the anticommutativity relation even though they satisfy all the nilpotency relations. The dual-number stuff for 2 by 2 is still correct, just ignore my use of the word "Grassmann".


Non-diagonalisable 2 by 2 matrices can be diagonalised over the dual numbers -- and the "weird cases" like the Galilean transformation are not fundamentally different from the nilpotent matrices.

The intuition here is that the Galilean transformation is sort of a "boundary case" between real-diagonalisability (skews) and complex-diagonalisability (rotations) (which you can sort of think in terms of discriminants). In the case of the Galilean transformation $\left[\begin{array}{*{20}{c}}{1}&{v}\\{0}&{1}\end{array}\right]$, it's a small perturbation away from being diagonalisable, i.e. it sort of has "repeated eigenvectors" (you can visualise this with MatVis). So one may imagine that the two eigenvectors are only an "epsilon" away, where $\varepsilon$ is the unit dual satisfying $\varepsilon^2=0$ (called the "soul"). Indeed, its characteristic polynomial is:

$$(\lambda-1)^2=0$$

Whose solutions among the dual numbers are $\lambda=1+k\varepsilon$ for real $k$. So one may "diagonalise" the Galilean transformation over the dual numbers as e.g.:

$$\left[\begin{array}{*{20}{c}}{1}&{0}\\{0}&{1+v\varepsilon}\end{array}\right]$$

Granted this is not unique, this is formed from the change-of-basis matrix $\left[\begin{array}{*{20}{c}}{1}&{1}\\{0}&{\epsilon}\end{array}\right]$, but any vector of the form $(1,k\varepsilon)$ is a valid eigenvector. You could, if you like, consider this a canonical or "principal value" of the diagonalisation, and in general each diagonalisation corresponds to a limit you can take of real/complex-diagonalisable transformations. Another way of thinking about this is that there is an entire eigenspace spanned by $(1,0)$ and $(1,\varepsilon)$ in that little gap of multiplicity. In this sense, the geometric multiplicity is forced to be equal to the algebraic multiplicity*.

Then a nilpotent matrix with characteristic polynomial $\lambda^2=0$ has solutions $\lambda=k\varepsilon$, and is simply diagonalised as:

$$\left[\begin{array}{*{20}{c}}{0}&{0}\\{0}&{\varepsilon}\end{array}\right]$$

(Think about this.) Indeed, the resulting matrix has minimal polynomial $\lambda^2=0$, and the eigenvectors are as before.


What about higher dimensional matrices? Consider:

$$\left[ {\begin{array}{*{20}{c}}0&v&0\\0&0&w\\0&0&0\end{array}} \right]$$

This is a nilpotent matrix $A$ satisfying $A^3=0$ (but not $A^2=0$). The characteristic polynomial is $\lambda^3=0$. Although $\varepsilon$ might seem like a sensible choice, it doesn't really do the trick -- if you try a diagonalisation of the form $\mathrm{diag}(0,v\varepsilon,w\varepsilon)$, it has minimal polynomial $A^2=0$, which is wrong. Indeed, you won't be able to find three linearly independent eigenvectors to diagonalise the matrix this way -- they'll all take the form $(a+b\varepsilon,0,0)$.

Instead, you need to consider a generalisation of the dual numbers, called the Grassmann numbers, with the soul satisfying $\epsilon^n=0$. Then the diagonalisation takes for instance the form:

$$\left[ {\begin{array}{*{20}{c}}0&0&0\\0&{v\epsilon}&0\\0&0&{w\epsilon}\end{array}} \right]$$


*Over the reals and complexes, when one defines algebraic multiplicity (as "the multiplicity of the corresponding factor in the characteristic polynomial"), there is a single eigenvalue corresponding to that factor. This is of course no longer true over the Grassmann numbers, because they are not a field, and $ab=0$ no longer implies "$a=0$ or $b=0$".

In general, if you want to prove things about these numbers, the way to formalise them is by constructing them as the quotient $\mathbb{R}[X]/(X^n)$, so you actually have something clear to work with.

(Perhaps relevant: Grassmann numbers as eigenvalues of nilpotent operators? -- discussing the fact that the Grassmann numbers are not a field).

You might wonder if this sort of approach can be applicable to LTI differential equations with repeated roots -- after all, their characteristic matrices are exactly of this Grassmann form. As pointed out in the comments, however, this diagonalisation is still not via an invertible change-of-basis matrix, it's still only of the form $PD=AP$, not $D=P^{-1}AP$. I don't see any way to bypass this. See my posts All matrices can be diagonalised (a re-post of this answer) and Repeated roots of differential equations for ideas, I guess.