I was looking at the definition of an orthogonal matrix, which is as follows:

Square matrix $Q$ is orthogonal if its columns are pairwise orthonormal, i.e.,

$$Q^TQ = I$$

Hence also

$$Q^T = Q^{−1}$$

I understood what it means for two vectors to be orthonormal, they basically need to be orthogonal, and, in addition, they have length $1$.

I don't understand why we have:

$$Q^TQ = I$$

Could you please explain this to me?


Let the matrix $M$ be an $n \times n$ matrix with components

$$M =\begin{pmatrix} v_1 & v_2 & \cdots &v_n \end{pmatrix}$$

where $v_i$ is the $i$th column vector with $n$ components.

Now, consider $M \times M$

$$M^T \times M = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\v_n \end{pmatrix} \times \begin{pmatrix} v_1 & v_2 & \cdots &v_n \end{pmatrix} \\ = \begin{pmatrix} v_1 \cdot v_1 & v_1 \cdot v_2 & \ldots &v_1 \cdot v_n \\ v_2 \cdot v_1 & v_2 \cdot v_2 & \ldots &v_2 \cdot v_n \\ v_3 \cdot v_1 & v_3 \cdot v_2 & \ldots &v_2 \cdot v_n \\ \vdots& \vdots & \ddots &\vdots \\ v_n \cdot v_1 & v_n \cdot v_2 & \ldots &v_n \cdot v_n \end{pmatrix}$$

However, remember that since the vectors $v_i$ that form the matrix $M$ are pairwise orthogonal,

$$v_i \cdot v_j = 1 \ \text{if} \ i = j \\ v_i \cdot v_j = 0 \ \text{if} \ i \neq j $$

Using this to simplify the matrix product,

$$M^T \times M = \begin{pmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 &\cdots & 0 \\ \vdots & \vdots & \vdots &\ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{pmatrix} = I $$


We need to recall how we compute each entry in a product $AB$ of matrices $A$ and $B$: The entry at row $i$, column $j$, is the scalar product of row $i$ from $A$, and column $j$ from $B$. Now, recall that the $i$'th row of $Q^T$ is the $i$'th column of $Q$ (draw up a matrix and its transpose to convince yourself of this). Thus the entry at row $i$, column $j$ in $Q^TQ$ is the scalar product of row $i$ in $Q^T$ and column $j$ in $Q$, or columns $i$ and $j$ from $Q$. Then if $i\neq j$ the entry is zero, and if $i=j$ the entry is one, by the orthonormality of the columns of $Q$.


That is because, if we denote $C_i$ the column vectors of $Q$, the coefficient $a_{ij}$ in ${}^{\mathrm t\mkern-2mu}Q\, Q$ is precisely $\langle C_i,C_j\rangle$.


Let $Q$ be a matrix with columns $C_1$,$C_2$,$C_3$, ..... $C_n$. Then $Q^T$ would a matrix with rows $C_1^T$,$C_2^T$,$C_3^T$, ..... $C_n^T$. Now $Q^TQ$, means we have to multiply these matrices. The 1st row and 1st column element of this result ($Q^TQ$) is the multiplication of the first row of $Q^T$ with the first column of the $Q$. That will be scalar one because we are multiplying $C_1^T$ with $C_1$ and $C_1$ is an orthonormal vector.

The 1st row and 2nd column element of the product of the matrices above ($Q^TQ$) is the multiplication of the first row of $Q^T$ with the second column of the $Q$. That will be scalar zero because we are multiplying $C_1^T$ with $C_2$ and $C_1$ $C_2$is are orthogonal to each other.

Reasoning in this way, you can conclude that $Q^TQ=I$


Let's start from the definition of orthogonal matrix you feel more comfortable with: A square matrix $Q \in \mathbb{R}^{n\times n}$ is said to be orthogonal iff its rows and its columns form an orthonormal base of $\mathbb{R}^n$, that is if, calling $r_{Q_i}$ and $c_{Q_i}$ the $i$-th row and column of $Q$ respectively, represented as column vectors, the following relations hold: $$r_{Q_i}^T r_{Q_j} = \delta_{ij}$$ $$c_{Q_i}^T c_{Q_j} = \delta_{ij}$$ where: $$\delta_{ij} = \begin{cases} 1,\; i=j \\ 0,\; i\ne j \end{cases}$$ and, of course, $a^T b := \sum\limits_{i=1}^{n}a_ib_i$ is the inner product between the two vectors of $\mathbb{R}^n$ $a$ and $b$, written as they were column vectors.

Then the definition you have found in the textbook is plain to see, since: $$Q = \begin{pmatrix} r_{Q_1}^T \\ r_{Q_2}^T \\ \vdots \\ r_{Q_n}^T \end{pmatrix} \quad Q^T = \begin{pmatrix} r_{Q_1} & r_{Q_2} & \cdots & r_{Q_n} \end{pmatrix}$$

$$QQ^T = \begin{pmatrix} r_{Q_1}^T r_{Q_1} & r_{Q_1}^T r_{Q_2} & \cdots & r_{Q_1}^T r_{Q_n} \\ r_{Q_2}^T r_{Q_1} & r_{Q_2}^T r_{Q_2} & \cdots & r_{Q_2}^T r_{Q_n} \\ \vdots & \vdots & \ddots & \vdots \\ r_{Q_n}^T r_{Q_1} & r_{Q_n}^T r_{Q_2} & \cdots & r_{Q_n}^Tr_{Q_n} \end{pmatrix} = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix} = I_n$$

Of course, $Q^TQ = I$ can be proved by considering the columns instead of the rows: $$Q = \begin{pmatrix} c_{Q_1} & c_{Q_2} & \cdots & c_{Q_n} \end{pmatrix} \quad Q^T = \begin{pmatrix} c_{Q_1}^T \\ c_{Q_2}^T \\ \vdots \\ c_{Q_n}^T \end{pmatrix} $$