Why a non-diagonalizable matrix can be approximated by an infinite sequence of diagonalizable matrices?

It is known that any non-diagonalizable matrix, $A$, can be approximated by a set of diagonalizable matrices, e.g. $A \simeq \lim_{k \rightarrow \infty} A_k$. Why this is true?

Note: I was faced with it for the first time at a note about a simple proof for Cayley-Hamilton theorem, but I was not able to find it in my books nor in the internet to the extent I've googled.


If matrix has distinct eigenvalues, it can be diagonalized.

You can perturb a matrix by an arbitrarily small amount so that all eigenvalues are distinct.

Give any $A$, let $J$ be the Jordan form. That is, for some $V$, you have $A=U J U^{-1}$.

Let $\Delta= \operatorname{diag}(1,2,...,n)$, where $n$ is the dimension of the matrix, and consider the sequence $A_k = A+ \frac{1}{k} U\Delta U^{-1}$ (thanks to p.s. for catching my mistake here).

The eigenvalues of $A_k$ are $[J]_{ii} + \frac{1}{k} i$, hence for $k$ large enough, the eigenvalues are distinct, and hence $A_k$ is diagonalizable.

Clearly $A_k \to A$, hence the result.

Note: If we use the Schur form instead, we can explicitly compute the distance $\|A-A_k\|_2 = \frac{n}{k}$.


Sorry to resurrect such an old post, but I would like to supply a proof of this without invoking Jordan normal form or Schur decomposition.

So we want to show the following statement.

Theorem. Let $T$ be a linear transformation on a finite-dimensional complex vector space $V$, say on $\mathbb{C}^n$ for simplicity and without loss of generality; given $\epsilon > 0$ there exists a linear map $S$ on $\mathbb{C}^n$ with $\|T - S\| < \epsilon$ so that $S$ is diagonalizable. Here the norm $\|\cdot\|$ is the operator norm, say using the standard Euclidean product $($or any other if one wishes$)$ on $\mathbb{C}^n$.

$($In other words, the diagonalizable linear transformations in a complex finite-dimensional vector space are "generic": fixing any basis, they are dense in the topology of $n \times n$ complex matrices, which is the usual topology on $\mathbb{C}^{n^2}$.$)$

Lemma 1. Let $T$ be a linear transformation on a complex finite-dimensional vector space $V$. Then there exists a basis in $V$ relative to which $T$ is represented by an upper triangular matrix.

Proof. Exercise left to the reader. For the lazy, there is a complete proof at this link, MATH 396. Quotient Spaces, and a complete proof in Axler's Linear Algebra Done Right. $\square$

Lemma 2. For a linear operator $T$ on vector space $V$ with dimension $n$, let $M(T) = \{t_{ij}\}$ be a matrix for $T$ in some basis. Then $\| T \| \le n \max_{1\le i, j \le n} |t_{ij}|$, where $\|T\| = \sup_{\|x \| = 1} \|Tx\|$.

Proof. Take some vector $x\in V$ with $\|x\| = 1$. Let $T_1, T_2, \dots, T_n$ be the row vectors of $M(T)$. Then $$\| Tx \| = \|(T_1 \cdot x \quad T_2 \cdot x \quad \dots \quad T_n \cdot x) \| \le \|(\|T_1\| \quad \|T_2\| \quad \dots \quad \|T_n\|)\| $$ $$ = \sqrt{\|T_1\|^2 + \|T_2\|^2 + \dots + \|T_n\|^2} \le \sqrt{n} \max_{1\le j \le n} \|T_j\| $$ $$=\sqrt{n} \max_{1\le j\le n} \sqrt{{t_{1j}}^2 + {t_{2j}}^2 + \dots +{t_{nj}}^2} \le n \max_{1\le j \le n} \max_{1\le i \le n} |t_{ij}| \le n \max_{1\le i, j \le n} |t_{ij}|.$$

$\tag*{$\square$}$

Fix $\epsilon > 0$. Let $n$ be the dimension of $V$. By Lemma $1$, there is some basis in which the matrix $M(T) = \{t_{ij}\}$ of $T$ is upper triangular. We will work in this basis. Construct a new matrix $M(S) = \{s_{ij}\}$ as follows. Let $s_{ij} = 0$ whenever $i>j$. For each ordered pair $(i,j)$ with $i \le j$, choose some complex number $s_{ij}$ such that $|s_{ij} - t_{ij}| < \epsilon/n$, with the added constraint that all the $s_{ii}$ are distinct for $1\le i \le n$ $($this is clearly possible$)$. Let $S$ be the linear transformation determined by $M(S)$. Since $|s_{ij} - t_{ij}| < \epsilon/n$ for all $i$, $j$ $($when $i>j$, $s_{ij} - t_{ij} = 0$$)$, by Lemma $2$ we have that $$\|T - S\| \le n \max_{1\le i, j \le n} |s_{ij} - t_{ij}| < \epsilon.$$ Now consider the characteristic polynomial of $M(S)$, $$p(\lambda) = (s_{11} - \lambda)(s_{22} - \lambda)\dots(s_{nn} - \lambda).$$ Since the $s_{ii}$ are distinct, $S$ has $n$ distinct eigenvalues. Therefore $S$ is diagonalizable. We conclude for every $\epsilon > 0$ we can find a linear map $S$ such that $\|T - S\| < \epsilon$ and $S$ is diagonalizable.