Let us look at two coordinate systems $K$ and $K'$ with axes, respectively, $(x_1,x_2,x_3)$ and $(x_1',x_2',x_3')$ and unit vectors ($\vec{e_1},\vec{e_2},\vec{e_3}$) and ($\vec{e_1'},\vec{e_2'},\vec{e_3'}$) respectively, so $x_i$ and $x_j$ ($i=1,2,3$ and $j=1,2,3$) are the Cartesian coordinates of an arbitrary point $P$. $A_i$ are the coordinates of the origin $O'$ of system $K'$ with respect to $K$.
Now, $b_{ij}$ are the direction cosines of the axes $O'x_j'$ of system $K'$ with respect to $K$. Then this should be true:
$\begin{array} \\ x_1 = a_1 + b_{11}x_1' + b_{12}x_2' + b_{13}x_3' \\ x_2 = a_2 + b_{21}x_1' + b_{22}x_2' + b_{23}x_3' \\ x_3 = a_3 + b_{31}x_1' + b_{32}x_2' + b_{33}x_3' \end{array}$

However for the life of me I cannot figure out why. As far as I know direction cosines are the cosines of the angles between the vector and the three coordinate axes. So $b_{11}$ would be $= \frac {\vec{Ox_1}}{\|\vec{O'x_1'}\|} = \frac{\vec{Ox_1}}{\sqrt{(Ox)_1^2 + (Ox)_2^2 + (Ox)_3^2}}$ OR they would be simply $\frac {x_1}{x_1'}$ - I dont know how what or why.

But I ALSO read that if we take a vector $H$ that exists both in the $K$ and $K'$ systems (with coordinates $H_1,H_2,H_3$ and $H_1',H_2',H_3'$ resp.), then that vector could be written as $$H = H_1i_1 + H_2i_2 + H_3i_3 \text{ and } H = H_1'j_1 + H_2'j_2 + H_3'j_3$$

and since $H=H$ we can find the components of the vector $H$ in the $K'$ system in term of the components of $H$ in the $K$ system by simply taking the dot product of this equation with the desired unit vector $e$ in the $K'$ system so that $$\begin{array} \\H_j' = H_1i_j' \cdot i_1 + H_2i_j' \cdot i_2 + H_3i_j' \cdot i_3 &\text{where j = 1,2,3}\end{array}$$

and this I don't get at all, as far as I know the dot product gives you a scalar.

This is terrible notation: I know, I am trying to bridge my two primary sources of information on the matter. One is a textbook, the other is this this pdf from ocw.mit.edu (page 9).


Solution 1:

Coordinate Transformations: A short treatise
by Bye_World

First off, let's go over the Einstein summation convention because it's pretty useful for this type of thing.

Let $\vec v \in \Bbb R^3$ be given by $\vec v=v^1\hat e_1 + v^2\hat e_2 + v^3\hat e_3 = \sum_{i=1}^3 v^i\hat e_i$ with respect to the orthonormal basis $\{\hat e_1, \hat e_2, \hat e_3\}$. Here the superscripts on the scalars $v^1, v^2, v^3$ are NOT EXPONENTS. They are just indices, the same as on $\hat e_1,$ etc. The reason to use this superscript index notation is to distinguish between objects which transform contravariantly vs covariantly. You can look at the link if you'd like more information, but it's not important for you to know what the difference is right now -- you should just take it as notation.

NOTE: You should notice here that $\vec v$ is a vector, $v^1$ is a scalar, and $\hat e_1$ is a vector.

The Einstein summation convention is just a suppression of the summation symbol $\sum$ in this type of expression. So the equation $\vec v = \sum_{i=1}^3 v^i\hat e_i$ would instead just be written $\vec v = v^i\hat e_i$ where summation over the $i$'s is implied.

The exact details are that:

  1. summation is always implied when the same index occurs once as a superscript and once as a subscript. Such indices are called summing indices or dummy indices. Example: $\vec v=\sum_{i=1}^3v^i\hat e_i = v^i\hat e_i$. Here $i$ is the only summing index.
  2. if there is an index which is unpaired, then it is not summed over and must appear on BOTH sides of the equation. Such indices are called free indices. Example: $\vec x_a = R^b_a \hat e_b = \sum_{b=1}^3 R^b_a \hat e_b$. Here $a$ is the free index and $b$ is the summing index. Notice here that $\vec x_a$ is a vector indexed by $a$ -- it is the $a$th vector in the set $\{\vec x_1, \vec x_2, \dots\}$, where I've left these vectors unspecified --, $R^b_a$ is a scalar indexed by $2$ indices -- where, in general, $R^1_2 \ne R^2_1$ --, and $\hat e_b$ is the $b$th standard basis vector.
  3. if your sum includes more than $1$ of the same index appearing as either a subscript or superscript, then you CANNOT use the Einstein summation convention. Example: $A^a_bB^c_bx^b\hat e_c$ is not a valid expression using the Einstein summation convention because there are two $b$ subscripts in this single term and thus the summation is ambiguous. If this summand occurs, you'd have to explicitly use the $\sum$ symbol.

You may have noticed the object $R_a^b$ in the above and wondered what it meant. Different types of objects are indexed differently.

Vectors are objects whose components can be distinguished by $1$ index. A vector can always be written as a column matrix where each entry is one of its components (another word for "components" is "coordinates"). For example $\vec v = \begin{bmatrix} v^1 \\ v^2 \\ v^3 \end{bmatrix}$. This is a $1$ dimensional array of the scalars $v^1, v^2, v^3$ and thus requires only $1$ index to specify any of its components.

A more general matrix, on the other hand, requires $2$ indices. For example, if $R=\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}$, then to specify a single component we need to specify both the column & the row. By convention, the $a$th row and $b$th column of the matrix $R$ is denoted $R^a_b$. Keep in mind that this component $R^a_b$ is a scalar, not a matrix. In this example, $R^1_2 = -\sin(\theta)$. We can see then that matrix multiplication is denoted, using the Einstein summation convention, as $X^a_c=Y^a_bZ^b_c$, where $X, Y, Z$ are matrices (multiply two matrices out and make sure that this DOES in fact produce the correct result).

There are higher dimensional objects called tensors which require more than $2$ indices, but again, it's not super important for you to know about them, yet, and I won't be going over them here.


Now, using this new convention, let's start talking about coordinate transformations. The first thing you should know about coordinate transformations is the difference between passive and active transformations.


An active transformation is when you transform an object -- here the only objects I'll consider are vectors -- directly. An active transformation does not change the space that the object is in, nor does in change the coordinate system (that is, the set of basis vectors). The only thing that changes are the coordinates of a single vector (or more general object). So we'd like a mapping $T: \Bbb R^n \to \Bbb R^n$ given by $T(\vec v) = \vec w$.

Let's look at an example in the plane with respect to the standard basis. Let's say you wanted to rotate your vector $\vec v=v^{\alpha}\hat e_{\alpha}$ clockwise by $\pi/4$ and then stretch the $x$ component by a factor of $2$.

Our goal here is to find a matrix $T$ such that $w^{\beta} = T^{\beta}_{\alpha} v^{\alpha}$ where $\vec w$ is the result of rotating and scaling $\vec v$ as specified above. However, trying to find that matrix directly could be a hassle. Because we know that the composition of linear maps is equivalent to matrix multiplication, why don't we try to split $T$ into the product of $2$ matrices. Let's let $T=SR$, where $R$ rotates the vector $\vec v$ and then $S$ scales the result.

Let's look at the rotation first:
enter image description here

Assume that the $A$ in the above image was originally pointing along the $x$ direction and then was rotated by angle $\theta$ to its current location. The coordinates of its new location are then $(\|A\|\cos(\theta),\|A\|\sin(\theta))$. So apparently our rotation matrix should transform $\begin{bmatrix} \|A\| \\ 0 \end{bmatrix} \mapsto \begin{bmatrix} \|A\|\cos(\theta) \\ \|A\|\sin(\theta)\end{bmatrix}$. Similarly (you should do it yourself) we could draw a picture to convince ourselves that our rotation matrix should transform $\begin{bmatrix} 0 \\ \|A\|\end{bmatrix} \mapsto \begin{bmatrix} -\|A\|\sin(\theta) \\ \|A\|\cos(\theta)\end{bmatrix}$.

By linearity, we know that this implies $R(1,0) = (\cos(\theta), \sin(\theta))$ and $R(0,1) = (-\sin(\theta), cos(\theta))$. We now have enough information to construct our transformation matrix: $\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta)\end{bmatrix}$.

In our particular case, this means that $R=\begin{bmatrix} \cos(-\frac {\pi}4) & -\sin(-\frac {\pi}4) \\ \sin(-\frac {\pi}4) & \cos(-\frac {\pi}4)\end{bmatrix} = \begin{bmatrix} \frac 1{\sqrt{2}} & \frac 1{\sqrt{2}} \\ -\frac 1{\sqrt{2}} & \frac 1{\sqrt{2}}\end{bmatrix}$ where the negative sign comes from the fact that we're rotating our system clockwise whereas the direction of positive increase of angles is conventionally counterclockwise.

Now for the scaling matrix. By definition it takes any vector $(x,y) \mapsto (2x,y)$. I hope it should be fairly obvious that the matrix representation of it is $\begin{bmatrix} 2 & 0 \\ 0 & 1\end{bmatrix}$, but if not you can figure it out the same way as above.

Now that we have $S$ and $R$ we can find our matrix $T$ by just multiplying the two together: $$T = SR = \begin{bmatrix} 2 & 0 \\ 0 & 1\end{bmatrix}\begin{bmatrix} \frac 1{\sqrt{2}} & \frac 1{\sqrt{2}} \\ -\frac 1{\sqrt{2}} & \frac 1{\sqrt{2}}\end{bmatrix} = \begin{bmatrix} \sqrt{2} & \sqrt{2} \\ -\frac 1{\sqrt{2}} & \frac 1{\sqrt{2}}\end{bmatrix}$$

Using the Einstein summation convention, our transformation is then $w^{\alpha} = S^{\alpha}_{\beta}R^{\beta}_{\gamma}v^{\gamma} = T^{\alpha}_{\delta}v^{\delta}$. In general ALL active linear transformations of vectors will have the form $w^{\alpha} = T^{\alpha}_{\beta}v^{\beta}$, where $w^{\alpha}$ is the $\alpha$th component of the vector $\vec w$ (with respect to some basis -- in my example it's the standard basis), $T^{\alpha}_{\beta}$ is the component of the matrix $T$ in the $\alpha$th row and $\beta$th column, and $v^{\beta}$ is the $\beta$th component of the vector $\vec v$ with respect to the same basis as $\vec w$.

Let's go ahead and calculate the coordinates of our vector $\vec w$: $$\vec w = T\vec v = \begin{bmatrix} \sqrt{2} & \sqrt{2} \\ -\frac 1{\sqrt{2}} & \frac 1{\sqrt{2}}\end{bmatrix} \begin{bmatrix} v^1 \\ v^2 \end{bmatrix} = \begin{bmatrix} \sqrt{2}v^1 + \sqrt{2}v^2 \\ -\frac 1{\sqrt{2}}v^1 + \frac 1{\sqrt{2}}v^2\end{bmatrix} = (\sqrt{2}v^1 + \sqrt{2}v^2)\hat e_1 + (-\frac 1{\sqrt{2}}v^1 + \frac 1{\sqrt{2}}v^2)\hat e_2$$


Things to notice:

  1. Active transformations do not change the SYSTEM OF COORDINATES -- they only change the coordinates of A SINGLE VECTOR. This can be seen by the fact that the general formula for an active transformation of a vector is $w^{\alpha} = T^{\alpha}_{\beta}v^{\beta}$ -- nowhere in this formula do the basis vectors show up.
  2. In fact, an active transformation can be defined even when there is no coordinate system specified. I have used the standard basis as my coordinate system so that I can get actual numbers, but in general, we can define an active transformation without doing so.
  3. An active transformation must be a transformation of a space into itself. It would not be defined for more general transformations $T: V \to W$.

Now let's look at passive transformations. Passive transformations are when you have some object in your space (such as a vector), but instead of transforming that object, you transform the coordinate system around it. The coordinates of that object will then change with (or actually against) the change in the basis vectors so that it stays the same. So to contrast a passive transformation with an active transformation, in a passive transformation, we want to transform the basis vectors themselves and a change in the coordinates of a vector are the result of keeping that vector the same, and in an active transformation we want to transform the coordinates of a specific vector with respect to a given set of basis vectors.

So let's look at an example. We've covered rotation and scaling transformations. How about this time we do a shear transformation? Let's look at the transformation $\hat e_1 \mapsto \vec f_1 = \hat e_1 + \hat e_2$ and $\hat e_2 \mapsto \vec f_2 = \hat e_2$. I'll leave it to you to verify that the set $\{\vec f_1, \vec f_2\}$ is a basis for the plane.

We see that the matrix which corresponds to this transformation is $T=\begin{bmatrix} 1 & 1 \\ 0 & 1\end{bmatrix}$ because $$\begin{bmatrix} \vec f_1 \\ \vec f_2 \end{bmatrix} = \begin{bmatrix} \hat e_1 + \hat e_2 \\ \hat e_2 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 0 & 1\end{bmatrix} \begin{bmatrix} \hat e_1 \\ \hat e_2 \end{bmatrix}$$

So let's see how the components of the vector $\vec v = v^{\alpha}\hat e_{\alpha}$ change when we represent it in the $\{\vec f_1, \vec f_2\}$ basis.

Well, first note that because it is a basis of the plane, we can represent the vector $\vec v$ as a linear combination of $\vec f_1, \vec f_2$. Therefore $\vec v = v^{\alpha}\hat e_{\alpha} = v^{\alpha'}\vec f_{\alpha'}$. Notice here that I use the notation $\alpha'$ to show that the components are with respect to a different basis. Also note that $\alpha$ and $\alpha'$ are different symbols and thus when they both appear in one term (as they will in a second), summation will NOT be implied (i.e. $a^x\vec b_{x'} \ne a^1\vec b_{1'} + a^2\vec b_{2'} + \cdots$).

We have $\vec v = v^{\alpha}\hat e_{\alpha} = v^{\alpha'}\vec f_{\alpha'} = v^{\alpha'}(T_{\alpha'}^{\alpha}\hat e_{\alpha}) = (T_{\alpha'}^{\alpha}v^{\alpha'})\hat e_{\alpha}$. Therefore we've found that $v^{\alpha} = T_{\alpha'}^{\alpha}v^{\alpha'}$.

And knowing that $\vec f_{\alpha'} = T\hat e_{\alpha}$ gives us $\hat e_{\alpha} = T^{-1}\vec f_{\alpha'}$ (note that a change of coordinates MUST be invertible). Therefore by the same steps as in the above paragraph, we have that $v^{\alpha'} = [T^{-1}]^{\alpha'}_{\alpha}v^{\alpha}$.

Having found equations relating $v^{\alpha'}$ to $v^{\alpha}$ and $\hat e_{\alpha}$ to $\vec f_{\alpha'}$, we've completely characterized our transformation.

I'll leave it to you to actually calculate what the components of $\vec v$ are with respect to this new basis.


Things to note:

  1. A passive transformation is a transformation of the basis vectors. However, for a vector to stay the same, its coordinates will need to change as well to reflect the change of basis. Given the change of basis $\vec e_i \to \vec f_i$, for $i=1,2,\dots, n$, a given vector can be represented by either set of coordinates: $\vec v = v^{\alpha}\vec e_{\alpha} = v^{\alpha'}\vec f_{\alpha'}$.
  2. The general equations for this are: $\vec f_{\alpha'} = T_{\alpha'}^{\alpha}\hat e_{\alpha}$, $\ \ \hat e_{\alpha} = [T^{-1}]^{\alpha'}_{\alpha}\vec f_{\alpha'}$, $\ \ v^{\alpha} = T_{\alpha'}^{\alpha}v^{\alpha'}$, and $v^{\alpha'} = [T^{-1}]^{\alpha'}_{\alpha}v^{\alpha}$.
  3. Each of those four above equations is enough to derive the others, so you wouldn't need to memorize them (though it's not particularly hard given their symmetric nature).
  4. Our space must be equipped with a coordinate system (set of basis vectors) for a passive transformation to make sense.

So far, we've only talked about linear transformations. An affine transformation can be described by a linear transformation plus a translation.

For fun, let's see why it doesn't matter whether we perform the translation or linear transformation first: if we have some vector $\vec v$ and we'd like to apply the affine translation $G$ to it, then we could represent it as $G(\vec v) = L(\vec v) + \vec t$, where $L$ is a linear transformation (one that can be represented by a matrix) and $\vec t$ is a translation vector. It could also be represented by $G(\vec v) = L(\vec v + \vec s)=L(\vec v) + L(\vec s)$, where we can see that $L(\vec s) = \vec t$. Thus they are equivalent.

Your system of equations is a representation of an affine transformation. We could represent it in matrix form as $$\begin{bmatrix} x^1 \\ x^2 \\ x^3 \end{bmatrix} = \begin{bmatrix} b^1_{1'} & b^1_{2'} &b^1_{3'} \\ b^2_{1'} & b^2_{2'} &b^2_{3'} \\ b^3_{1'} & b^3_{2'} &b^3_{3'} \end{bmatrix}\begin{bmatrix} x^{1'} \\ x^{2'} \\ x^{3'} \end{bmatrix} + \begin{bmatrix} a^1 \\ a^2 \\ a^3 \end{bmatrix}$$

Or in index notation as: $$x^{\alpha} = b^{\alpha}_{\alpha'} x^{\alpha'} + a^{\alpha}$$

(Do you see how this notation is much cleaner?)

The linear part works exactly as normal. And the translation part should be pretty clear as well -- after the vector is transformed linearly, it is translated by $\begin{bmatrix} a^1 \\ a^2 \\ a^3 \end{bmatrix}$.

This is technically enough to specify your transformation, but notice that as a passive transformation, it'd be nice if we had a formula relating the basis vectors of the $K$ and $K'$ systems of coordinates. I'll leave this as an exercise for you. Notice that this will also be an affine transformation, so if you can't figure it out, let me know. HINT: if the coordinates of the vector is translated in this direction $\vec a$, which direction should the basis vectors be translated so that the vector stays the same?


Things to notice:

  1. Technically every linear transformation is an affine transformation. We can see this by setting the translation vector $\vec t=0$.
  2. However, we generally consider affine spaces to not contain the origin. Remember that vector spaces always contain the origin and a linear mapping always maps to origin to the origin. Thus, we wouldn't generally want to treat a space we know is a vector space as an affine space, or we'd lose some tools for dealing with it.

NOTE: I have not focused on affine transformations here because it seems your problems are more with the linear parts of these transformations. If you'd like more help in understanding affine transformations, let me know and I can go further into it.


Now, let me try to take a crack at your questions:

First question: Why are the elements $b^i_j$ the cosines of the angles between the $i$th basis vector of $K$ and the $j$th basis vector of $K'$?

Let's focus on the linear transformation $$\begin{bmatrix} x^1 \\ x^2 \\ x^3 \end{bmatrix} = \begin{bmatrix} b^1_{1'} & b^1_{2'} & b^1_{3'} \\ b^2_{1'} & b^2_{2'} & b^2_{3'} \\ b^3_{1'} & b^3_{2'} & b^3_{3'} \end{bmatrix}\begin{bmatrix} x^{1'} \\ x^{2'} \\ x^{3'} \end{bmatrix}$$ This seems to be your problem, and the translation part of your original problem isn't super difficult.

Notice that $K =\{\hat e_1, \hat e_2, \hat e_3\}$ and $K' = \{\hat f_{1'}, \hat f_{2'}, \hat f_{3'}\}$ (I've changed your notation for the basis vectors of $K'$ slightly) are both orthonormal bases. So let's consider an arbitrary vector $\vec x \in \Bbb R^3$. Because $K$ and $K'$ are both bases of $\Bbb R^3$, we can represent $\vec x$ as a linear combination of either basis: $\vec x = x^{\alpha}\hat e_{\alpha} = x^1\hat e_1 + x^2\hat e_2 + x^3\hat e_3 = x^{\alpha'}\hat f_{\alpha'} = x^{1'}\hat f_{1'} +x^{2'}\hat f_{2'} + x^{3'}\hat f_{3'}$. What we'd like to get here is the relationship between each $x^{\alpha}$ and $x^{\alpha'}$.

We can see from the above matrix equation (and from our earlier discussion on passive transformations) that the relationship should be $x^{\alpha} = b^{\alpha}_{\alpha'}x^{\alpha'}= b^{\alpha}_{1'}x^{1'} + b^{\alpha}_{2'}x^{2'} + b^{\alpha}_{3'}x^{3'}$. So what we need to do is figure out what these $b^{\alpha}_{\alpha'}$'s are.

To do that, let's take the dot product of $\vec x$ with $\hat e_{1}$. Then we have $\hat e_1 \cdot \vec x = \hat e_1 \cdot (x^1\hat e_1 + x^2\hat e_2 + x^3\hat e_3) = \hat e_1 \cdot (x^{1'}\hat f_{1'} +x^{2'}\hat f_{2'} + x^{3'}\hat f_{3'})$. Let's take this one step at a time. The left equality gives us $\hat e_1 \cdot \vec x = x^1(\hat e_1 \cdot \hat e_1) + x^2(\hat e_1 \cdot \hat e_2) + x^3(\hat e_1 \cdot \hat e_3) = x^1(1) + x^2(0) + x^3(0) = x^1$. The ones and zeroes come from the fact that $K$ is an orthonormal basis set.

The right equality gives us $\hat e_1 \cdot \vec x = x^{1'}(\hat e_1 \cdot \hat f_{1'}) + x^{2'}(\hat e_1 \cdot \hat f_{2'}) + x^{3'}(\hat e_1 \cdot \hat f_{3'})$. Here the dot products don't just give us ones and zeroes because $\hat e_1$ is not necessarily orthogonal or parallel to any of the $\hat f_{\alpha'}$'s. So we'll have to evaluate them with the identity $\vec a \cdot \vec b = \|\vec a\|\|\vec b\|\cos(\theta_{a,b})$, where $\theta_{a,b}$ is the angle between the vectors $\vec a$ and $\vec b$.

In this case $\hat e_1 \cdot \hat f_{\alpha'} = (1)(1)\cos(\theta_{1,\alpha'})$, where $\theta_{1,\alpha'}$ is the angle between $\hat e_1$ and $\hat f_{\alpha'}$. Thus we have $\hat e_1 \cdot \vec x = x^{1'}\cos(\theta_{1,1'}) + x^{2'}\cos(\theta_{1,2'}) + x^{3'}\cos(\theta_{1,3'})$.

Putting these two equalities back together, we have $x^1 = x^{1'}\cos(\theta_{1,1'}) + x^{2'}\cos(\theta_{1,2'}) + x^{3'}\cos(\theta_{1,3'})$. Notice that this is of the form $x^1 = b^1_{1'}x^{1'} + b^1_{2'}x^{2'} + b^1_{3'}x^{3'}$. Because the values of $b^{\alpha}_{\alpha'}$ for each $\alpha$ and $\alpha'$ are unique, we know that $b^1_{1'} = \cos(\theta_{1,1'})$, $b^1_{2'} = \cos(\theta_{1,2'})$, and $b^1_{3'} = \cos(\theta_{1,3'})$. We'd get similar results by taking the dot product of $\vec x$ with $\hat e_2$ and $\hat e_3$, so we can say that in general $b^{\alpha}_{\alpha'} = \cos(\theta_{\alpha,\alpha'})$. Where remember $\cos(\theta_{\alpha,\alpha'})$ is just the cosine of the angle between $\hat e_{\alpha}$ and $\hat f_{\alpha'}$.

Second Question: How does the equation $\begin{array} \\H_j' = H_1i_j' \cdot i_1 + H_2i_j' \cdot i_2 + H_3i_j' \cdot i_3 &\text{where j = 1,2,3}\end{array}$ make sense when the dot product produces a scalar?

Hopefully by now you see that both the left and right hand sides are in fact scalars. The LHS is the $j$th component of the vector $H$ in the primed basis and the right hand side is an expression for calculating this component w.r.t. to the components of the unprimed basis. The scalar $H^a\hat f_{b'} \cdot\hat e_a$ is just the component $H^a$ multiplied by the cosine of the angle between the vector $\hat f_{b'}$ and $\hat e_a$ (I've changed the notation to mine so that you can see how my notation relates to yours) -- where summation is implied by the Einstein convention.