Change of Basis Confusion

Maybe looking at the one-dimensional case will clarify the point of confusion. "seconds" and "minutes" are both units of time and can be taken to be bases of a one-dimensional real vector space representing time.

If I ask, what is the factor that takes me from the basis {seconds} to the basis {minutes}? Then the answer is 60. (The (1 by 1) matrix consisting of the number 60 is the $T$ of the question.)

However, if I ask, 120 seconds is equal to how many minutes? Then the factor I need to apply is 1/60.

In either case, I am "going from seconds to minutes", but in the first case I am changing the basis elements themselves, from the basis {seconds} to the basis {minutes}, while in the other case, I am converting a fixed unit of time from seconds to minutes. The matrices in the two cases are inverses of each other.

The difference in terminology depends on which of these procedures you think should be called the "change of basis matrix".


The "change of basis matrix from $\beta$ to $\gamma$" or "change of coordinates matrix from $\beta$-coordinates to $\gamma$-coordinates" is the matrix $A$ with the property that for every vector $v\in V$, $$A[v]_{\beta} = [v]_{\gamma},$$ where $[x]_{\alpha}$ is the coordinate vector of $x$ relative to $\alpha$. This matrix $A$ is obtained by considering the coordinate matrix of the identity linear transformation, from $V$-with-basis-$\beta$ to $V$-with-basis-$\gamma$; i.e., $[\mathrm{I}_V]_{\beta}^{\gamma}$.

Now, you say you want to take $T\colon V\to V$ that sends $v_i$ to $w_i$, and consider "the matrix of this linear transformation". Which matrix? With respect to what basis? The matrix of $T$ relative to $\beta$ and $\gamma$, $[T]_{\beta}^{\gamma}$, is just the identity matrix. So not that one.

Now, if you take $[T]_{\beta}^{\beta}$; i.e., you express the vectors $w_i$ in terms of the vectors $v_i$, what do you get? You get the matrix that takes $[x]_{\gamma}$ and gives you $[x]_{\beta}$; that is, you get the change-of-coordinates matrix from $\gamma$ to $\beta$. To see this, note that for example that $[w_1]_{\gamma} = (1,0,0,\ldots,)^t$, so $[T]_{\beta}^{\beta}[w_1]_\gamma$ is the first column of $[T]_{\beta}^{\beta}$, which is how you express $w_1$ in terms of $\beta$.

Which is why it would be the "change of basis matrix from $\gamma$ to $\beta$. Because, as Qiaochu mentions in the answer I linked to, the "translation" of coordinates vectors achieved by this matrix goes "the other way": it translates from $\gamma$-coordinates to $\beta$-coordinates, even though you "defined" $T$ as "going" from $\beta$ to $\gamma$.


If $(u_1,\ldots,u_n)$ and $(w_1,\ldots,w_n)$ are bases of $V$ then there is indeed a unique linear transformation $T:\ V\to V$ such that $T(u_i)=w_i$ $(1\leq i\leq n)$, but this transformation is of no help in understanding what is going on here.

What is at stake is the following: Any vector $x\in V$ has some coordinates $(x_1,\ldots, x_n)$ with respect to the "old" basis $(u_1,\ldots,u_n)$ and another set of coordinates $(\bar x_1,\ldots, \bar x_n)$ with respect to the "new" basis $(w_1,\ldots,w_n)$. The vectors $x$ do not move, but you want to know the connection between the $x_k$ and the $\bar x_i$.

The data about this coordinate transformation are stored in a matrix $T=(t_{ik})_{1\leq i\leq n,\ 1\leq k\leq n}$ in the following way: Any "new" basis vector $w_i$ is a linear combination of the old basis vectors $u_k$, therefore there are (given) numbers $t_{ik}$ such that $$w_i=\sum_{k=1}^n t_{ki} u_k\ .$$ This is to say that in the columns of $T$ we see the "old coordinates" of the "new" basis vectors. Now any vector $x\in V$ has "new coordinates" $\bar x_i$. Writing this out we have $$x=\sum_{i=1}^n \bar x_i w_i= \sum_{i,k} \bar x_i t_{ki} u_k= \sum_{k=1}^n \Bigl(\sum_{i=1}^n {t_{ki} \bar x_i}\Bigr) u_k\ ,$$ and we see that the "old coordinates" $x_k$ of the same vector $x\in V$ are given by $$x_k\ =\ \sum_{i=1}^n t_{ki}\bar x_i\ .$$ If we write our "coordinate vectors" as column vectors we therefore have the formula $x=T\ \bar x$.

One has to get accustomed to the fact that the symbol $x$ denotes at the same time the "geometric object" $x$ and its "coordinate vector" with respect to the "old basis".