Examples for proof of geometric vs. algebraic multiplicity

Forget about the indices and the seemingly complicated appearance of your image. Let us focus on key ideas.

If the geometric multiplicity of $\lambda$ is $r$ then, by definition, you know there are $r$ linearly independent (eigen)vectors satisfying $T(v_1)=\lambda\cdot v_1, ...,T(v_r)=\lambda\cdot v_r$. In your image this equations are just the first $r$ equations of the big first system.

Since if your finite vector space $V$ must have some dimension $n$ and you have already those $r$ linearly independent $v$ vectors, there is a theorem that guarantees you can fill in other $s=n-r$ linearly independent new vectors $w_1,...,w_s$ so that all the $r+s=n$ independent vectors $v_1,...,v_r,w_1,...,w_s$ form a basis. The other $s$ equations are thus the action/image of $T$ on these other $w$ vectors and they have the form of a linear combination of the $v,w$ vectors because you do not know how $T$ acts on them, but at least you know the image must be a vector of your space and therefore can be written as a linear combination of the basis given by the $v$ and $w$ vectors. Hence, for example $$T(w_1)=\text{some vector} =\sum_{i=1}^{r}a_i\cdot v_i+\sum_{i=1}^{s}b_i\cdot w_i$$ and similarly for the images of the other $T(w_j)$. Since there are $j=1,...,s$ vectors $w_j$, you get the other $s$ equations of the system. So just like in the previous example above, you have in general different coefficients $a_i,b_i$ for each $T(w_j)$, so you can make this distinction explicit making those coefficients depend on $j$ as well, getting the $a_{ij},b_{ij}$ you see in your image (the $a_{ij}$ is the coefficient of $T(w_j)$ in the $v_i$ vector of the basis and the $b_{ij}$ is the coefficient of $T(w_j)$ in the $w_i$). Thus, you get your first system of equations which is just the action of $T$ on your chosen basis formed by $v$'s and $w$'s.

Now, you can arrange all this in matrix notation since any operator $T$ has an associated matrix $M$ in every basis of $V$. The action $T(u)$ on a general (column) vector $u$ can then be written $T(u) = M\cdot u$. You can look up in your book or just work a little bit to realize that the columns of the matrix $M$ must be the coordinates of the images of $T$ on every vector of the basis. That is to say, the first column of $M$ must be the coordinates of the vector $T(v_1)$, the second column those of $T(v_2)$, etc. Since you have $r+s$ basis vectors, there are $r+s$ columns each with $r+s$ components: the first $r$ columns are each one formed by the coordinates of $T(v_i)$ on your basis, and since they are eigenvectors, each column is all 0 except for the $i$ position where you must write a $\lambda$, reflecting the fact that $T(v_i)=\lambda v_i$. In this way you get the first $r$ columns of $M$ where you can easily see a diagonal of $\lambda$ which is a vertical way to write the coefficients of the first $r$ equations of the previous discussed system above. Similarly, the remaining $s$ columns of $M$ are the coordinates of the vectors $T(w_j)$. For example, the column $r+1$ is given by $T(w_1)$ and you just write vertically the coefficients $a_{i1}$ followed by $b_{i1}$ and similarly with the others.

Thus, you arrive at a final form for your matrix $M$ which can be decomposed into blocks simply looking at its disposition of coefficients. There is a square $r\times r$ submatrix given by the $\lambda$ diagonal so you can write it like $\lambda \text{I}_r$ and below it an $s\times s$ zero submatrix written just $0$ in your image. The $r\times r$ coefficients $a_{ij}$ are organized similarly in a matrix $\text{A}$ and the $s\times s$ coefficients $b_{ij}$ in a matrix $\text{B}$ (remember, we wrote the coefficients in columns, i.e. vertically, that is way your book says $\text{A}=(a_{ij})^t$, since $^t$ means transposition which turns your horizontal coefficients of your first system of equations into vertical position interchanging rows for columns). Then we arrive at $$M=\begin{pmatrix} \lambda\text{I}_r & A \\ 0 & B \end{pmatrix}$$

Finally, we must study the characteristic polynomial of $M$ given by $p_c(M)=\det (M-x\text{I}_n)$ to establish its roots and check the algebraic multiplicity of the root $x=\lambda$. Since $M$ has a subdiagonal $\lambda\text{I}_r$, the determinant has the form $$ \det(M-x\text{I}_n) = \begin{vmatrix} (\lambda-x)\text{I}_r & A \\ 0 & B-x\text{I}_s \end{vmatrix} $$ and we can expand it by the columns of the first diagonal block to ensure that $\lambda$ is a root of the characteristic polynomial since we get $p_c(M)=(\lambda-x)^r\det(B-x\text{I}_s)=(\lambda-x\text{I}_r)^r p_c(B)$. This is due to the fact that the determinant of a diagonal matrix is the product of the factors of its diagonal, thus resulting in the monomial $(\lambda-x)$ with algebraic multiplicity $r$.

Now, we do not know anything else about $B$ so we cannot conclude anything about its characteristic polynomial $p_c(B)$, so $M$ has eigenvalue $\lambda$ of geometric multiplicity $r$ with AT LEAST algebraic multiplicity $r$. Since the eigenvalues of $B$ is a new problem we do not know whether $p_c(B)$ has any more roots $\lambda$ thus putting more factors of $(\lambda-x)$ into $p_c(M)$. For example if $B$ has eigenvalue $\lambda$ with algebraic multiplicity $k$ we would get $$ p_c(M)=(\lambda-x)^r p_c(B)=(\lambda-x)^r (\lambda-x)^k\cdots = (\lambda-x)^{r+k}\cdots $$ hence establishing an algebraic multiplicity of $(r+k)$ for the eigenvalue $\lambda$ of $M$. In the best case $k=0$, i.e. $B$ has no eigenvalue $\lambda$, so the algebraic multiplicity in $M$ would coincide with the geometric multiplicity $r$. Therefore $$\text{geometric multiplicity}= r\leqslant r+k=\text{algebraic multiplicity}\; .$$


An Example:

Suppose you have an operator $T:\mathbb{R}^4\rightarrow\mathbb{R}^4$ with only 2 independent eigenvectors $v_1, v_2$ for the eigenvalue $\lambda=2$, that is: $2=\dim\ker(T-\lambda\text{I}_4)=n-\text{rank}(T-\lambda\text{I}_4)= 4-2$ is its geometric multiplicity $m_g(\lambda)$. By definition you must have that $T(v_1)=\lambda v_1=2v_1$ and $T(v_2)=2v_2$.

Now $v_1,v_2$ are independent but not enough to form a basis of $V=\mathbb{R}^4$ since $n=\dim\mathbb{R}^4=4$, so you need another $n-m_g(\lambda)=2$ independent vectors $w_1,w_2$ so that the set $\{v_1,v_2,w_1,w_2\}$ is linearly independent and thus a basis of $\mathbb{R}^4$. You can always find such $w$'s since you know the canonical basis $\{e_1,e_2,e_3,e_4\}=\{(1,0,0,0),(0,1,0,0)...\}$ (think of them as columns) in which the eigenvectors $v_1,v_2$ can be expanded as linear combination. It is not hard to see that you can exchange some $e_i$ from the set for $v_1$ as long as they remain independent (since $v$ is a nonzero linear combination of the $e$, you can solve for one and express this particular $e_i$ as a linear combination of $v$ and the remaining $e$'s). This process gives you a new basis $\{v_1,e_2,e_3,e_4\}$ (I have numbered them so that the $e_1$ was the one exchanged, but need not be). Generalizing this reasoning you can complete any independent set of vectors to a basis. So in our example we are sure a basis of $\mathbb{R}^4$ can be given by the set of vectors $\{v_1,v_2,w_1,w_3\}$ (we use general $w$ since the beginning basis need not be the canonical one or we may need to make linear combinations along the process of exchange).

To know the action of an operator $T$ on any vector $u=c_1v_1+c_2v_2+c_3w_1+c_4w_2$ we need to know its action on the vectors of a basis, since we are dealing with linear operators $T(au+bv)=aT(u)+bT(v)$: $$ T(u)=c_1T(v_1)+c_2T(v_2)+c_3T(w_1)+c_4T(w_2) $$ So we need the $T(v_1),T(v_2),T(w_1),T(w_2)$ as vectors in the chosen basis (i.e. the $a_{ij}$ and $b_{ij}$ of the theory above). In order to work with a concrete example, let us assume that the action of $T$ is given by the following equations, which is a particular example of the system of equations of your image: $$ \begin{align*} & T(v_1)=\lambda v_1=2v_1 \\ & T(v_2)=\lambda v_2=2v_2 \\ & T(w_1)=1\cdot v_1+2\cdot v_2 + 1\cdot w_1 + \left(\frac{-2}{3}\right)\cdot w_2 \\ & T(w_2)=3\cdot v_1+4\cdot v_2 + 3\cdot w_1 + 4\cdot w_2 \end{align*} $$

This leads us to the matrix representation of $T$, since the vectors $T(v_1)...$ seen as columns of a matrix $M=(T(v_1),T(v_2),T(w_1),T(w_2))$ give the right expression for the action of $T$ on any vector $u$ written above: $$ T(u)=M\cdot u= (T(v_1),T(v_2),T(w_1),T(w_2))\cdot \begin{pmatrix} c_1\\ c_2\\ c_3\\ c_4 \end{pmatrix} $$ Thus, in our example we arrive at the matrix with the aforementioned form $$ M=\left( \begin{array}{cccc} 2 & 0 & 1 & 3 \\ 0 & 2 & 2 & 4 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & -\frac{2}{3} & 4 \end{array} \right) $$ Notice that every column of the previous matrix is formed by the numbers which are the coefficients of the rows of the system of vector equations given just above.

Finally, to study the multiplicity of $\lambda=2$ we need to get the characteristic polynomial of $M$ which is $$ p_c(M)= \begin{vmatrix} 2-x & 0 & 1 & 3 \\ 0 & 2-x & 2 & 4 \\ 0 & 0 & 1-x & 3 \\ 0 & 0 & -\frac{2}{3} & 4-x \end{vmatrix} = (2-x)^2 \begin{vmatrix} 1-x & 3 \\ -\frac{2}{3} & 4-x \end{vmatrix} =(2-x)^2 p_c(B) $$ with the lower-right block $B=\left( \begin{array}{cccc} 1 & 3 \\ -\frac{2}{3} & 4 \end{array} \right)$.

However, $B$ itself has characteristic polynomial $p_c(B)=(2-x)(3-x)$, i.e. $\lambda=2$ is also an eigenvalue of B (note that this fact is other problem in a different space: $\mathbb{R}^2$). Therefore we arrive that the characteristic polynomial of $M$ and therefore of $T$ is (for $\lambda=2$): $$ p_c(T)=p_c(M)=(2-x)^2(2-x)(3-x)=(2-x)^3(3-x)\Rightarrow m_a(\lambda)=3 $$ where from my theory above $r=2$, $k=1$ which gives an algebraic multiplicity $m_g(2)$ of $3$, while the geometric multiplicity is $2$ since $M$ has no more than 2 eigenvectors for eigenvalue 2. Indeed, the number of independent eigenvectors in that case is the number of completely zero columns or rows putting $x=\lambda=2$ $$ M-2\text{I}_4=\left( \begin{array}{cccc} 0 & 0 & 1 & 3 \\ 0 & 0 & 2 & 4 \\ 0 & 0 & -1 & 3 \\ 0 & 0 & -\frac{2}{3} & 2 \end{array} \right)\Rightarrow m_g(\lambda)=\dim\ker(M-\lambda\text{I}_4)=4-2 $$ since $\text{rank}(M-\lambda\text{I}_4)=2$ as the two last columns remain independent (and we used the Rank-Nullity theorem, the formula mentioned in the first paragraph of the example).


If the geometric multiplicity of $\lambda\in\mathbb C$ for the $n$-by-$n$ matrix $A\in M_n(\mathbb C)$ is at least $k$, then there are $k$ linearly independent eigenvectors $v_1$, $v_2$, $\dots$, $v_k$ corresponding to the eigenvalue $\lambda$. In other words, the system of equations $$(\lambda I-A)x=0$$ has a solution space of dimension at least $k$. This means that the rank of $\lambda I-A$ is at most $n-k$. It follows that by doing row operations (both permutating rows and adding to a row a multiple of another row) you can transform $\lambda I-A$ into a matrix $B$ whose last $k$ rows are zero.

Now let $t$ be a variable (as opposed to a scalar, like $\lambda$) and perform exactly the same row operations on the matrix $t I-A$, whose entries are now polynomials in $t$. This gives you a matrix $C$, with entries also polynomials in $t$, which becomes $B$ when you evaluate these polynomials at $\lambda$. In particular, all the matrix entries of $C$ in the last $k$ rows are polynomials which vanish at $\lambda$, so they are divisible by $t-\lambda$. It is immediate then that the determinant of $C$ is divisible by $(t-\lambda)^k$. Indeed, you can pull the common factor $t-\lambda$ from each of those rows outside of the determinant.

Since the determinants of $t I-A$ and $C$ at worst differ by a constant multiple, this means that the characteristic polynomial of $A$, namely $\det(t I-A)$, is divisible by $(t-\lambda)^k$.

Notice that this is not the same argument as in the image you posted. I happen to like it much more, though :)

 

An example: Let $$A=\left( \begin{array}{cccc} -1 & 2 & 0 & 1 \\ -3 & 4 & 0 & 1 \\ -\frac{2}{3} & 0 & 2 & \frac{2}{3} \\ -3 & 2 & 0 & 3 \end{array} \right).$$ If on the matrix $2 I-A$ I substract the first row from the second and the fourth, then substract $2/9$ times the first row from the third, and finally interchange the second and third rows, I get the matrix $$\left( \begin{array}{cccc} -3 & 2 & 0 & 1 \\ 0 & -\frac{4}{9} & 0 & \frac{4}{9} \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right).$$ This is a matrix of rank $2$ ---and the last two rows zero--- so in particular the geometric multiplicity of $2$ as an eigenvalue of $A$ is $2$.

If I now repeat the same operations on the matrix $tI-A$, I get $$\left( \begin{array}{cccc} -t-1 & 2 & 0 & 1 \\ -\frac{2}{9} (-t-1)-\frac{2}{3} & -\frac{4}{9} & 2-t & \frac{4}{9} \\ t-2 & 2-t & 0 & 0 \\ t-2 & 0 & 0 & 2-t \end{array} \right).$$ As predicted above, this matrix has in its last two rows polynomials which vanish at $2$. Computing the determinant $$\left| \begin{array}{cccc} -t-1 & 2 & 0 & 1 \\ -\frac{2}{9} (-t-1)-\frac{2}{3} & -\frac{4}{9} & 2-t & \frac{4}{9} \\ t-2 & 2-t & 0 & 0 \\ t-2 & 0 & 0 & 2-t \end{array} \right|$$ is very easy, and you can actually see without computing it in full that it is a polynomial divisible by $(t-2)^2$. This means that the algebraic multiplicity of $2$ as an eigenvalue is at least $2$.


Basically, the idea is in the previous answers. Nevertheless, I think the following proof is easier to understand. Let's prove the following:

Fact: Suppose $\lambda_0$ is an eigenvalue of $A$ and with geometric multiplicity $k$, then its algebraic multiplicity is at least $k$.

Proof: By the assumption, we can find an orthonormal basis for the subspace $\mathrm{ker}(A-\lambda_0 I)$. Let us denote them by $u_1,u_2,\cdots,u_k$. We can then find an orthonormal basis for $\mathbb{R}^n$ by completing this basis. Denote them by $\{u_1,u_2,\cdots,u_k,u_{k+1},\cdots,u_n\}$. Denote the matrix formed by these vectors (as columns) by $S$. Then let $\lambda$ be a variable, we easily verify $$ (A-\lambda I)S = \pmatrix{(\lambda_0-\lambda)u_1, \cdots (\lambda_0-\lambda)u_k, v_{k+1}(\lambda),\cdots, v_n(\lambda)}. $$ The right hand side above is a matrix whose first $k$ columns are of the form $$ (\lambda_0-\lambda) u_j, j=1,\cdots,k. $$ The remaining columns have entries depending on $\lambda$ (in fact the entries are affine functions of $\lambda$) but we don't care their exact form.

Take the determinant on the first equation in the proof. We get $$ \det(A-\lambda I) \cdot \det(S) = (\lambda_0-\lambda)^k \det\pmatrix{u_1,\cdots,u_k,v_{k+1}(\lambda),\cdots,v_n(\lambda)} = (\lambda_0-\lambda)^k g(\lambda) $$ where $g(\lambda)$ is a polynomial in $\lambda$.

Since $\det(S) = \pm 1$, we see clearly that $\lambda_0$ is a root of the characteristic polynomial and its algebraic multiplicity is at least k. QED