Prove that the Sylvester equation has a unique solution when $A$ and $-B$ share no eigenvalues

This does not exactly answer the original question but provides an alternative proof that seems simpler than the one on Wikipedia as of October 2, 2020.

Theorem. Given matrices $A\in \mathbb{C}^{m\times m}$ and $B\in \mathbb{C}^{n\times n}$, the Sylvester equation $AX-XB=C$ has a unique solution $X\in \mathbb{C}^{m\times n}$ for any $C\in\mathbb{C}^{m\times n}$ if and only if $A$ and $B$ do not share any eigenvalue.

Proof. The equation $AX-XB=C$ is a linear system with $mn$ unknowns and the same amount of equations. Hence it is uniquely solvable for any given $C$ if and only if the homogeneous equation $$ AX-XB=0 $$ admits only the trivial solution $0$.

Assume that $A$ and $B$ do not share any eigenvalue. Let $X$ be a solution to the abovementioned homogeneuous equation. Then $AX=XB$, which can be lifted to $A^kX = XB^k$ for each $k \ge 0$ by mathematical induction. Consequently, $$ p(A) X = X p(B) $$ for any polynomial $p$. In particular, let $p$ be the characteristic polynomial of $A$. Then $$p(A)=0$$ due to the Cayley-Hamilton theorem; meanwhile, the spectral mapping theorem tells us $$ \sigma(p(B)) = p(\sigma(B)), $$ where $\sigma(\cdot)$ denotes the spectrum of a matrix. Since $A$ and $B$ do not share any eigenvalue, $p(\sigma(B))$ does not contain $0$, and hence $p(B)$ is nonsingular. Thus $X= 0$ as desired. This proves the "if" part of the theorem.

Now assume that $A$ and $B$ share an eigenvalue $\lambda$. Let $u$ be a corresponding right eigenvector for $A$, $v$ be a corresponding left eigenvector for $B$, and $X=u{v}^*$. Then $X\neq 0$, and $$ AX-XB = A(uv^*)-(uv^*)B = \lambda uv^*-\lambda uv^* = 0. $$ Hence $X$ is a nontrivial solution to the aforesaid homogeneous equation, justifying the "only if" part of the theorem. Q.E.D.

Remark. The theorem remains true if $\mathbb{C}$ is replaced by $\mathbb{R}$ everywhere. The proof for the "if" part is still applicable; for the "only if" part, note that both $\mathrm{Re}(uv^*)$ and $\mathrm{Im}(uv^*)$ satisfy the homogenous equation $AX-XB=0$, and they cannot be zero simultaneously.


The first implication is Bézout's identity for polynomials. It's an equivalent for the Euclidean domain of polynomials of the ordinary one about coprime integers $x$ and $y$ having integers $a$ and $b$ so that $ax+by=1$.


The second one can be seen inductively. $g(A)$ is a sum of monomials $A^k$, so by linearity it suffices to prove that $A^kX=X(-B)^k$ for integer $k$ at least $1$ (the constant term is obvious, since $I$ commutes with $X$). This follows by induction:

  • The basis case is $AX=-XB$, which we already have.
  • If it is true for $k$ (viz. $A^kX=X(-B)^k$), then $$A^{k+1}X = A(A^kX)=A(X(-B)^k) = (AX)(-B)^k = (-XB)(-B)^k = X(-B)^{k+1},$$ where the second equality uses the induction hypothesis and the third uses the basis case.

Hence it is true for all integer $k\geq 1$, and the implication follows.


In case you don't already know how this works, I think it's very useful to see how eigenvectors/eigenvalues of $A$ and $B$ directly give you eigenvalues of the "Sylvester operator" $X \mapsto AX+XB$. Actually, it is conceptually clearer to work with with the transpose $C=B^t$ instead of $B$, which makes no difference because $B$ and $C$ have the same spectral theory. So the operator is $$X \mapsto AX+XC^t : M_n(\mathbb{C}) \to M_n(\mathbb{C})$$ The basic idea is revealed in the case where $A$ and $C$ are diagonalizeable. Let $u_1,\ldots, u_n$ be a basis of eigenvectors for $A$ with accompanying eigenvalues $\lambda_1,\ldots,\lambda_n$. Let $w_1,\ldots,w_n$ be a basis of eigenvectors for $C$ with eigenvalues $\mu_1,\ldots,\mu_n$. One can check that the outer products \begin{align*} E_{ij} := u_i w_j^t && i,j = 1,\ldots,n \end{align*} are linearly independent in $M_n(\mathbb{C})$ and satisfy $$ AE_{ij} + E_{ij} C^t = (\lambda_i + \lambda_j)E_{ij}.$$

The linear independence of the $E_{ij}$ is basically an manifestation the tensor product isomorphism $(\bigotimes_i U_i) \oplus (\bigotimes_j W_j) \cong \bigoplus_{i,j} (U_i \otimes V_i)$ for vector spaces.

Conclusion: If $A$ and $C$ are diagonalizable, then so is $X \mapsto AX+XC^t$. Moreover, the eigenvalues of $X \mapsto AX+XC^t$ are precisely $\lambda_i + \mu_j$, where $\lambda_i$ is an eigenvalue of $A$ and $\mu_j$ is an eigenvalue of $C$. In particular, as long as no eigenvalue of $A$ is the negative of an eigenvalue of $C$, the operator $X \mapsto AX+XC$ is invertible.

By working a bit harder, I guess you should be able to show analogous things about the generalized eigenspaces, leading to the result you're after. I just wanted to make sure this direct connection between the eigenvalues of $A$, $C$ and the operator $X \mapsto AX+XC^t$ was available to you.