Proof general state space similarity transformation to controllable canonical form
Given a state space model of the form,
$$ \begin{align} \dot{x} &= A\,x + B\,u \\ y &= C\,x + D\,u \end{align} \tag{1} $$
however I think that this would also apply to a discrete time model.
Assuming that this state space model is controllable, I would like to find a nonsingular similarity transform $z=T\,x$, which would transform the state space to the following model,
$$ \begin{align} \dot{z} &= \underbrace{T\,A\,T^{-1}}_{\bar{A}}\,z + \underbrace{T\,B}_{\bar{B}}\,u \\ y &= \underbrace{C\,T^{-1}}_{\bar{C}}\,z + \underbrace{D}_{\bar{D}}\,u \end{align} \tag{2} $$
such that it is in the controllable canonical form with,
$$ \bar{A} = \begin{bmatrix} 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 1 \\ -a_n & -a_{n-1} & \cdots & -a_2 & -a_1 \end{bmatrix} \tag{3a} $$
$$ \bar{B} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix} \tag{3b} $$
When $A$ is in the Jordan canonical form, with Jordan blocks of at most size one by one (so no of diagonal terms),
$$ A = \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix} \tag{4} $$
with at most algebraic multiplicity of one. The states of matrix $(3a)$ can be seen as integrals of the next state and the last state a linear combination of the previous ones, therefore it can be shown that similarity transforms of the form,
$$ T = \left[\begin{array}{c c} \alpha_1 \begin{pmatrix} 1 \\ \lambda_1 \\ \lambda_1^2 \\ \vdots \\ \lambda_1^{n-1} \end{pmatrix} & \alpha_2 \begin{pmatrix} 1 \\ \lambda_2 \\ \lambda_2^2 \\ \vdots \\ \lambda_2^{n-1} \end{pmatrix} & \cdots & \alpha_n \begin{pmatrix} 1 \\ \lambda_n \\ \lambda_n^2 \\ \vdots \\ \lambda_n^{n-1} \end{pmatrix} \end{array}\right] \tag{5} $$
would bring $(4)$ to $(3a)$. The values for $\alpha_i$ can be solved for using $\bar{B}=T\,B$ and $(3b)$, when defining $B$ as,
$$ B = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix} \tag{6} $$
then this equality can be written as,
$$ \begin{bmatrix} b_1 & b_2 & \cdots & b_n \\ \lambda_1\,b_1 & \lambda_2\,b_2 & \cdots & \lambda_n\,b_n \\ \lambda_1^2\,b_1 & \lambda_2^2\,b_2 & \cdots & \lambda_n^2\,b_n \\ \vdots & \vdots & \cdots & \vdots \\ \lambda_1^{n-1}\,b_1 & \lambda_2^{n-1}\,b_2 & \cdots & \lambda_n^{n-1}\,b_n \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix} \tag{7} $$
It can be noted that in this case the matrix in equation $(7)$ is the same as the transpose of the controllability matrix,
$$ \mathcal{C} = \begin{bmatrix}B & A\,B & A^2B & \cdots & A^{n-1}B\end{bmatrix} \tag{8} $$
so the solution to equation $(7)$ can also be written as,
$$ \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{bmatrix} = \mathcal{C}^{-T} \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix} \tag{9a} $$
$$ \vec{\alpha} = \mathcal{C}^{-T} \bar{B} \tag{9b} $$
The transpose of $T$ can, similar to equation $(7)$, also be written as,
$$ T^T = \begin{bmatrix}\vec{\alpha} & A\,\vec{\alpha} & A^2\vec{\alpha} & \cdots & A^{n-1}\vec{\alpha}\end{bmatrix} \tag{10} $$
or if define a new vector $\vec{v}$ as the transpose of $\vec{\alpha}$ and substitute $\vec{\alpha}$ for the right hand side of equation $(9b)$,
$$ \vec{v} = \begin{bmatrix}0 & \cdots & 0 & 1\end{bmatrix} \mathcal{C}^{-1} \tag{11a} $$
$$ T = \begin{bmatrix} \vec{v} \\ \vec{v}\, A \\ \vec{v}\, A^2 \\ \vdots \\ \vec{v}\, A^{n-1} \end{bmatrix} \tag{11b} $$
From this expression it can also be seen that if $\mathcal{C}$ is not full-rank, then such a transformation would not exist.
After some testing it seems that this expression also seem to hold for any $A$ and $B$, also long as $\mathcal{C}$ is full-rank/invertible, but in that case equation $(10)$ should contain $A^T$ instead of $A$ (but when using equation $(4)$, then $A=A^T$). However I do not know how I could go about proving that this is always the case.
Also a small side question: How could one define this transformation when $B$ is of size $n$ by $m$, with $m>1$? I suspect that in the controllable canonical form $\bar{B}$ should be of the form,
$$ \bar{B} = \begin{bmatrix} 0 & \cdots & 0 \\ \vdots & \cdots & \vdots \\ 0 & \cdots & 0 \\ 1 & \cdots & 1 \end{bmatrix} \tag{12} $$
Solution 1:
For a single-input system the transformation that yields the controller canonical form is $$T=\left[\matrix{q\\qA\\ \vdots\\qA^{n-1}}\right]$$ where $q$ is the last row of the controllability matrix inverse i.e. $$\mathcal{C}^{-1}=\left[\matrix{X\\ \hline q}\right]$$ This property ensures that $$qA^{i-1}b=\begin{cases}0,\quad i=1,\cdots,n-1\\ 1,\quad i=n \end{cases}$$ which can be used along with the Cayley-Hamilton theorem to prove that $$Tb=\left[\matrix{qb \\ \vdots \\ qA^{n-2}b \\qA^{n-1}b}\right]=\left[\matrix{0 \\ \vdots \\ 0 \\1}\right]=\bar{B}$$ $$TA=\left[\matrix{qA \\ \vdots \\ qA^{n-1} \\qA^{n}}\right]=\left[\matrix{qA \\ \vdots \\ qA^{n-1} \\-q\sum_{i=1}^{n}a_{n-i+1}A^{i-1}}\right]=\left[\matrix{0 & 1 & 0& \cdots & 0\\ 0 & 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 1\\-a_n & -a_{n-1} & -a_{n-2}& \cdots & -a_1}\right]\left[\matrix{q \\ qA\\ \vdots \\ qA^{n-2} \\qA^{n-1}}\right]=\bar{A}T$$
For the multiple input case $B\in\mathbb{R}^{n\times m}$ the situation is more complex. The calculation involves the so called controllability indices $\mu_1,\mu_2,\cdots,\mu_m$ and $\bar{B}$ is of the form $$\bar{B}=\left[\matrix{0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 1 & * & * & \cdots & *\\\hline 0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 0 & 1 & * & \cdots & *\\\hline \vdots & \vdots & \vdots & \ddots & \vdots\\\hline 0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 0 & 0 & 0 & \cdots & 1}\right] $$ where $*$ denotes a not necessarily zero element. The $m$ nonzero rows of $\bar{B}$ are the $\mu_1,\mu_1+\mu_2,\cdots,\mu_1+\mu_2+\cdots+\mu_m$ rows. For more details I suggest you to consult the book Antsaklis and Michel, "A linear systems primer" .