What is an intuitive way to understand the dot product in the context of matrix multiplication?

Solution 1:

We have $x = x_i e_i = x'_i e'_i$ where $e_i$ and $e'_i$ are bases related by nonsingular linear transformation. Note that ${e'}_i^T e'_j = g_{ij}$, where $g$ is invertible. Thus, ${e'}_i^T e_j x_j = {e'}_i^T e'_j x'_j = g_{ij}x'_j$ or $$x'_i = (g^{-1})_{ij}{e'}_j^T e_k x_k = (g^{-1})_{ij}{e'}_j^T x.$$ This gives us two good pieces of intuition. First, for a nonsingular linear transformation $A$ we can think of the elements of $A$ as being given by $$a_{ij} = (g^{-1})_{ik}{e'}_k^T e_j,$$ that is, by the dot product of a certain linear combination of the transformed basis vectors with the untransformed basis vectors. Second, to find the result of applying $A$ to $x$ we simply dot the same linear combination of the transformed basis vectors with the vector $x$.

For orthogonal transformations we find $g_{ij} = \delta_{ij}$ and so $$x'_i = {e'}_i^T e_j x_j = {e'}_i^T x \hspace{5ex}\textrm{and}\hspace{5ex} a_{ij} = {e'}_i^T e_j.$$

Note: We use Einstein's summation convention, $x = x^i e_i \equiv \sum_i x^i e_i$. For this problem the dual basis is $e^i = e_i^T$. The dual of $x$ is $x^T$, so $x_i e^i = x^i e_i^T$. We need not distinguish between $x_i$ and $x^i$ and so we write $x = x_i e_i$.

Example

Let $$A = \left(\begin{array}{cc}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{array}\right).$$ Then $$\left(\begin{array}{cc}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right)$$ will give the components of $x$ in the new basis $e'_i$, where $[e_i]_j = \delta_{ij}$ is the standard basis. (This is a passive, rather than active, transformation.) It is straightforward to show that $e'_i = A^{-1}e_i = A^T e_i,$ so $$e'_1 = \left(\begin{array}{c}\cos\theta \\ \sin\theta\end{array}\right) \hspace{5ex}\textrm{and}\hspace{5ex} e'_2 = \left(\begin{array}{c}-\sin\theta \\ \cos\theta\end{array}\right).$$ One can then easily check that the elements of $A$ are given by $a_{ij} = {e'}_i^T e_j$. Note that, $$x'_1 = {e'}_1^T e_j x_j = \left(\begin{array}{cc}\cos\theta & \sin\theta\end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right)$$ and $$x'_2 = {e'}_2^T e_j x_j = \left(\begin{array}{cc}-\sin\theta & \cos\theta\end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right),$$ as expected.

Solution 2:

It is easiest to see this in one dimension first.

Our goal is to show that any linear transformation $T: \mathbb{R}^n \rightarrow \mathbb{R}$ can be represented in the form $Tu = \beta^Tu$ for some $n$-dimensional vector $\beta$. Say that $u \in \mathbb{R}^n$; let $e_1, \ldots, e_n$ be the standard basis vectors for $\mathbb{R}^n$ (where $e_i$ has a $1$ in the $i$th position, and $0$'s elsewhere.) Then we can write $u = \sum_i u_i e_i$ where $u_i$ is the $i$-th coordinate of $u$. Since $T$ is linear, we have $$Tu =T \sum_i u_i e_i = \sum_i u_i Te_i = \sum_i u_i \beta_i,$$ where $\beta_i = Tb_i$. This means that with respect to the standard basis, $Tu = \beta \cdot u$ where $\beta$ is the vector $(\beta_1, \ldots, \beta_n)$. Thus every linear map from $\mathbb{R}^n$ to $\mathbb{R}$ can be represented by taking the dot product with a fixed vector.

Now for the multi-dimensional case: If $T: \mathbb{R}^n \rightarrow \mathbb{R}^m$ then $T$ is equivalent to the $m$-tuple of functions $(T_1, T_2, \ldots, T_m)$ where $T_jx$ is the $j$-th coordinate of $Tx$. Then for each $j$ there is a vector $\beta^j$ with $T_jx = \beta^j \cdot x$ and the result follows. (note that the $j$ in $\beta^j$ is just a superscript here meaning the $j$-th one, and doesn't have anything to with the $j$-th power.)

Solution 3:

I think an intuitive way to think about matrix multiplication is to regard it as a combination of coordinate extraction and scalar multiplication. First, given any basis for a finite vector space, such as $\mathbb{R}^n$, we usually express any vector $\mathbf{v}=\sum_i c_i \mathbf{e}_i=[c_1,c_2,\dots,c_n]$ as an $n$-tuple of scalars. Given any vector $\mathbf{v}$, the functions that extracts the $i$-th coordinate $c_i,$ are linear functionals, and form a basis for the dual space. They can be thought of similar to projection maps.

Second, given any scalar $c$, the operation of multipliying a vector $\mathbf{v}$ by $c$ to produce $c\mathbf{v}$ is a linear transformation to $\mathbb{R}^n$. The composition of extracting the $i$-th coordinate of a vector $\mathbf{v}$ and then multiplying the scalar by another vector $\mathbf{e}_j$ as a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ we denote by $T_{ij}.$ Now a matrix $\mathbf{A}$ with entries $a_{ij}$ is associated with the finite sum $\sum_{ij} a_{ij}T_{ij}$ as a linear transformation. Note that the identity map $I_n=\sum_i T_{ii}$ is the sum of projection maps $T_{ii}.$

We can think of this in two ways. First, the $i$-th row of the matrix $\mathbf{A}$ is associated with the composite map $\mathbf{v} \mapsto (\sum_j a_{ij} c_j)\mathbf{e}_i$ which is a dot product scalar multiplied by a basis vector. Second, the $j$-th column of the matrix $\mathbf{A}$ is associated with the composite map $\mathbf{v} \mapsto c_j(\sum_i a_{ij}\mathbf{e}_i)$ which is the scalar $c_j$, the $j$-th coordinate of $\mathbf{v}$, times the $j$-th column vector of $\mathbf{A}$. Your original understanding of matrix multiplication was pretty good.