Solution 1:

If the fields are infinite, there is an easy proof.

Let $F \subseteq K$ be a field extension with $F$ infinite. Let $A, B \in \mathcal{Mat}_n(F)$ be two square matrices that are similar over $K$. So there is a matrix $M \in \mathrm{GL}_n(K)$ such that $AM = MB$. We can write: $$ M = M_1 e_1 + \dots + M_r e_r, $$ with $M_i \in \mathcal{M}_n(F)$ and $\{ e_1, \dots, e_r \}$ is a $F$-linearly independent subset of $K$. So we have $A M_i = M_i B$ for every $i = 1,\dots, r$. Consider the polynomial $$ P(t_1, \dots, t_r) = \det( t_1 M_1 + \dots + t_r M_r) \in F[t_1, \dots, t_r ]. $$ Since $\det M \neq 0$, $P(e_1, \dots, e_r) \neq 0$, hence $P$ is not the zero polynomial. Since $F$ is infinite, there exist $\lambda_1, \dots, \lambda_r \in F$ such that $P(\lambda_1, \dots, \lambda_r) \neq 0$. Picking $N = \lambda_1 M_1 + \dots + \lambda_r M_r$, we have $N \in \mathrm{GL}_n(F)$ and $A N = N B$.

Solution 2:

THEOREM 1. Let $E$ be a field, let $F$ be a subfield, and let $A$ and $B$ be $n$ by $n$ matrices with coefficients in $F$. If $A$ and $B$ are similar over $E$, they are similar over $F$.

This is an immediate consequence of

THEOREM 2. In the above setting, let $X$ be an indeterminate, and let $g_k(A)\in F[X]$, $1\le k\le n$, be the monic gcd of the determinants of all the $k$ by $k$ submatrices of $X-A$. Then $A$ and $B$ are similar over $F$ if and only if $g_k(A)=g_k(B)$ for all $k$.

References:

Basic Algebra I: Second Edition, Jacobson, N., Section 3.10.

A Survey of Modern Algebra, Birkhoff, G. and Lane, S.M., 2008. In the 1999 edition it was in Section X1.8, titled "The Calculation of Invariant Factors".

Algèbre: Chapitres 4 à 7, Nicolas Bourbaki. Translation: Algebra II.

(I haven't found online references.)

[I'm using the community wiki mode to encourage you to improve this answer.]

Here is the sketch of a proof of Theorem 2.

EDIT [This edit follows Soarer's interesting comment.] Each of the formulas $fv:=f(A)v$ and $fv:=f(B)v$ (for all $f\in F[X]$ and all $v\in F^n$) defines on $F^n$ a structure of finitely generated module over the principal ideal domain $F[X]$. Moreover, $A$ and $B$ are similar if and only if the corresponding modules are isomorphic. The good news is that a wonderful theory for the finitely generated modules over a principal ideal domain is freely available to us. TIDE

THEOREM 3. Let $A$ be a principal ideal domain and $V$ a finitely generated $A$-module. Then $V$ is isomorphic to $\oplus_{i=1}^nA/(a_i)$, where the $a_i$ are elements of $A$ satisfying $a_1 | a_2 | \cdots | a_n$. Here $a | b$ means "$a$ divides $b$". Moreover the ideals $Aa_i$ are uniquely determined by these conditions.

Let $K$ be the field of fractions of $A$, and $S$ a submodule of $A^n$. The maximum number of linearly independent elements of $S$ is also the dimension of the vector subspace of $K^n$ generated by $S$. Thus this integer, called the rank of $S$, only depends on the isomorphism class of $S$ and is additive with respect to finite direct sums.

THEOREM 4. (a) $S$ is free of rank $r\le n$.

(b) There is a basis $u_1,\dots,u_n$ of $A^n$ and there are nonzero elements $a_1,\dots,a_r$ of $A$ such that $a_1u_1,\dots,a_ru_r$ is a basis of $S$ and $a_1 | a_2 | \cdots | a_r$.

We'll need the

PROPOSITION. Let $f$ be an $A$-valued bilinear map defined on a product of two $A$-modules. Then the image of $f$ is an ideal.

Proof of the Proposition. Let $T$ be the set of all ideals of the form $(f(x,y))$, let $(f(x,y))$ and $(f(u,v))$ be two elements of $T$, and suppose that $(f(x,y))$ is maximal in $T$. It suffices to show that $f(x,y) | f(u,v)$.

Claim: $f(x,y) | f(x,v)$ and $f(x,y) | f(u,y)$. Indeed, any generator of $(f(x,y),f(x,v))$ is of the form $$a f(x,y)+bf(x,v)=f(x,ay+bv),$$ and the maximality of $(f(x,y))$ implies $f(x,y) | f(x,v)$. The proof of $f(x,y) | f(u,y)$ is similar.

We can assume $f(x,v)=0=f(u,y)$. [Write $f(x,v)=a f(x,y)$ and replace $v$ by $v-ay$. Similarly for $u$.] Using the equality $$f(a x+b u,y+v)=a f(x,y)+b f(u,v)$$ for all $a$ and $b$ in $A$, and arguing as in the proof of the claim, we see that $f(x,y) | f(u,v)$. QED

Proof of Theorem 4. We assume (as we may) that $S$ is nonzero, we let $f$ be the bilinear form on $A^n$ whose matrix with respect to the canonical basis is the identity matrix, and we pick a generator $a_1=f(s_1,y_1)$ of $f(S\times A^n)$. [Naively: $a_1$ is the gcd of the coordinates of the elements of $S$.] Clearly, $u_1:=s_1/a_1$ is in $A^n$, and we have $$A^n=Au_1\oplus y_1^\perp,\quad S=As_1\oplus(S\cap y_1^\perp),$$ where $y_1^\perp$ is the orthogonal of $y_1$. Then (a) follows by induction on $r$. In particular $y_1^\perp$ and $S\cap y_1^\perp$ are free of rank $n-1$ and $r-1$, and (b) follows also by induction. [Note that if $x$ belongs to a basis of $A^n$, then $f(x,y)=1$ for some $y$ in $A^n$.] QED

Proof of Theorem 3. Let $v_1,\dots,v_n$ be generators of the $A$-module $V$, let $(e_i)$ be the canonical basis of $A^n$, and let $\varphi:A^n\to V$ be the $A$-linear epimorphism mapping $e_i$ to $v_i$. Theorem 4 shows that there is an $r\le n$ and an $A$-linear from $\psi:A^r\to A^n$ such that $\psi(A^r)=\ker\varphi$. This implies that $V$ is isomorphic to $\oplus_{i=1}^nA/(a_i)$, where the $a_i$ are as in Theorem 3. Assume that $V$ is also isomorphic to $\oplus_{i=1}^mA/(b_i)$, where the $b_i$ satisfy the same conditions as the $a_i$. We only need to prove $m=n$ and $(a_i)=(b_i)$ for all $i$. Let $p\in A$ be a prime. It suffices to prove the above equality in the case where $V$ is the direct sum of a finite family of modules of the form $$V_i:=A/(p^{i+1}).$$ For each $j$ the quotient $p^jM/p^{j+1}M$ is an $A/(p)$ vector space of finite dimension $n_j$. The multiplicity of $A/(p^{i+1})$ is then $n_i-n_{i+1}$.

Here is a way to see this. Form the polynomial $V(X):=\sum\ n_j\ X^j$. We have $$V_i(X)=\frac{X^{i+1}-1}{X-1}=1+X+X^2+\cdots+X^i,$$ and we must solve $\sum\ m_i\ V_i(X)=\sum\ n_j\ X^j$ for the $m_i$, where the $n_j$ are considered as known quantities (almost all equal to zero). Multiplying through by $X-1$ we get $$\sum\ m_{i-1}\ X^i-\sum\ m_i=\sum\ (n_{i-1}-n_i)\ X^i,$$ whence the formula. QED

Proof of Theorem 2. Let the notation of Theorem 2 be in force. In particular $F$ is a field, and $A$ is now an $n$ by $n$ matrix with coefficients in $F$. The formula $fv:=f(A)v$, for $f\in F[X]$ and $v\in F^n$, defines an $F[X]$-module structure on $F^n$. Let $(b_i)\subset F[X]^n$ and $(e_i)\subset F^n$ be the canonical basis, let $\varphi:F[X]^n\to F^n$ be the $F[X]$-module epimorphism mapping $b_i$ to $e_i$, and let $\psi$ be the $F[X]$-module endomorphism of $F[X]^n$ whose matrix is $X-A$. It is not difficult to check that we have $\psi(A^r)=\ker\varphi$, as in the proof of Theorem 3. Now Theorem 2 follows from Theorem 3. QED

Here is a proof of the Chinese Remainder Theorem (which has been silently used).

CHINESE REMAINDER THEOREM. Let $A$ be a ring and $I_1,\dots,I_n$ ideals such that $I_p+I_q=A$ for $p\not=q$. Then the natural morphism from $A$ to the product of the $A/I_p$ is surjective. Moreover the intersection of the $I_p$ coincides with their product.

Proof. Multiplying the equalities $A=I_1+I_p$ for $p=2,\dots,n$ we get $$A=I_1+I_2\cdots I_n.\qquad(*)$$ In particular there is an $a_1$ in $A$ such that $$a_1\equiv1\bmod I_1,\quad a_1\equiv0\bmod I_p\ \forall\ p>1.$$ Similarly we can find elements $a_p$ in $A$ such that $a_p\equiv\delta_{p q}\bmod I_q$ (Kronecker delta). This proves the first claim. Let $I$ be the intersection of the $I_p$. Multiplying (*) by $I$ we get $$I=I_1I+II_2\cdots I_n\subset I_1\ (I_2\cap\cdots\cap I_n)\subset I.$$ This gives the second claim, directly for $n=2$, by induction for $n>2$. QED

EDIT. Here is a more intrinsic formulation (taken from Bourbaki) of the argument used to prove Theorem 2.

Let $K$ be a commutative ring, let $V$ be a $K$-module, let $f$ be an endomorphism of $V$, let $X$ and $Y$ be indeterminates, and let $V[X]$ be the $K[X]$-module of polynomials in $X$ with coefficients in $V$. [In particular, any $K$-basis of $V$ is a $K[X]$-basis of $V[X]$.] Equip $V$ and $V[X]$ with the $K[X,Y]$-module structures characterized by $$ X^iY^j\cdot v=f^{i+j}v,\quad X^iY^j\cdot X^kv=X^{i+k}f^jv $$ for all $i,j,k$ in $\mathbb N$ and all $v$ in $V$. Let $\phi$ be the unique $K[X,Y]$-linear map from $V[X]$ to $V$ satisfying $\phi(v)=v$ for all $v$ in $V$.

Claim: the sequence $$ 0\to V[X]\xrightarrow{X-Y}V[X]\xrightarrow{\phi}V\to0 $$ is exact.

The only nontrivial inclusion to verify is $\mathrm{Ker}\ \phi\subset\mathrm{Im}(X-Y)$. For $$ v=\sum_{i\ge0}X^iv_i $$ in $\mathrm{Ker}\ \phi$, we have $$ v=\sum_{i\ge0}X^iv_i-\sum_{i\ge0}Y^iv_i=\sum_{i\ge1}\ (X^i-Y^i)\ v_i=(X-Y) \sum_{j+k=i-1}X^jY^kv_i. $$