Motivation for construction of cross-product (Quaternions?)

I'm trying to present a narrative that brings the (3D) Cross Product into existence.

"Given two vectors $\mathbf u$, $\mathbf v$, how to construct a vector perpendicular to both?"

... looks like a good motivating problem.

So, given two vectors u, v we must create a third vector w s.t. $\mathbf w \cdot \mathbf u = \mathbf w \cdot \mathbf v = 0$. This gives:

$$\mathbf w = \lambda (u_2 v_3 - u_3 v_2, - u_1 v_3 + u_3 v_1, u_1 v_2 - u_2 v_1)$$

... and, taking $\lambda=1$, then to establish the Cross Product Rule we would need to show that $|\mathbf w|$ gives the (signed) area of the parallelogram formed by $u$ and $v$.

But this approach seems to require several lines of moderately dense algebra.

I tried looking for a geometrical approach and found a handful of links:

Cross product as result of projections
Explanation of a cross product result

However, I'm struggling to see it. (EDIT: nearly there: Geometric understanding of the Cross Product)

Another potentially interesting approach is Quaternions. I found a very interesting article here: http://www.johndcook.com/blog/2012/02/15/dot-cross-and-quaternion-products/

The author observes that by defining i,j,k s.t. $i^2=j^2=k^2=ijk=-1$, multiplying two pure quaternions $\mathbf u=0+bi+cj+dk$ and $\mathbf v=0+fi+gj+hk$ gives:

$$\mathbf u \mathbf v = -(\mathbf u \cdot \mathbf v) + \mathbf u \times \mathbf v$$

So my question is: On account of this succinctness, should the Quaternion approach be considered the origin of the Cross Product? Can anyone present an elegant narrative that brings the Cross Product into existence?

PS Links to investigate:
http://math.oregonstate.edu/bridge/papers/dot+cross.pdf
https://en.wikiversity.org/wiki/Cross_product
https://www.physicsforums.com/threads/explanation-of-the-cross-product.513287/
http://behindtheguesses.blogspot.co.uk/2009/04/dot-and-cross-products.html


Before meeting the cross product, students will have already met the dot product. This is a form of multiplication that takes two vectors and gives you a scalar. It's natural to wonder whether there's also some kind of multiplication that takes two vectors and gives you another vector.

Obviously we could just write down any old function and call it "the cross product". But in order for it to actually be a nice form of multiplication there are a few properties that we would expect it to have:

$$\begin{align*} (\lambda\mathbf u) \times \mathbf v=\lambda(\mathbf u \times \mathbf v)=\mathbf u \times (\lambda\mathbf v)\\ \mathbf u \times (\mathbf v+\mathbf w)=\mathbf u \times \mathbf v+\mathbf u \times \mathbf w\\ (\mathbf u + \mathbf v)\times\mathbf w=\mathbf u \times \mathbf w+\mathbf v \times \mathbf w \end{align*}$$

(these properties are called "being bilinear").

There's one more property that we require in order for the cross product to make sense geometrically, which is that it shouldn't depend on which way you are looking at your problem. If we rotate our vectors and then take their cross product we should get the same answer as if we take their cross product and then rotate. Otherwise people looking at the same problem from different angles would get different answers! In symbols we represent the rotation by some orthogonal matrix $M$ with determinant $1$, and say that for each such matrix we want the following property:

$$M(\mathbf u\times\mathbf v)=(M\mathbf u)\times(M\mathbf v)$$

(This is called "invariance" or sometimes "covariance". Notice that the dot product also makes geometric sense in this way. If you rotate two vectors and then take their dot product you get the same answer as you would have gotten before the rotation. Or in other words $(M\mathbf u)\cdot(M\mathbf v)=u\cdot v$.)

Now here's the clever bit: I claim that the cross product is the only function with these properties. This explains why the cross product is interesting: it's the only form of multiplication that makes any sense at all! (Actually there are also the functions like $\lambda\mathbf u\times\mathbf v$ that are just scalings of the cross product by some factor, but each of these functions can be written in terms of any of the others and so we can just pick the usual cross product to be our favourite, and work in terms of that.)


Proof

I'll show that if we have a function, $\times$, that is bilinear and invariant (i.e. it obeys the four equations listed above) then it is in fact the cross product. We'll work in terms of the usual basis vectors $\mathbf i$, $\mathbf j$ and $\mathbf k$.

First we'll try to work out what $\mathbf i \times \mathbf j$ is. Let $M$ be the rotation of $180^\circ$ about the $k$-axis. Then $M\mathbf i=-\mathbf i$ and $M\mathbf j=-\mathbf j$. So we have

$$M(\mathbf i \times \mathbf j)=(M\mathbf i)\times(M\mathbf j)=(-\mathbf i)\times(-\mathbf j)=\mathbf i\times\mathbf j$$

(the last step used bilinearity to move the minus signs out so they could cancel). This means that $\mathbf i \times \mathbf j$ is fixed by the rotation $M$, and so must lie on the $\mathbf k$-axis. As I said before, we're going to allow ourselves to pick our favourite scaling, so since we know $\mathbf i \times \mathbf j$ is on the $\mathbf k$-axis we might as well assume that $\mathbf i \times \mathbf j=\mathbf k$.

There's a rotation (of $120^\circ$ about $\mathbf i+\mathbf j+\mathbf k$) that takes $\mathbf i$ to $\mathbf j$, $\mathbf j$ to $\mathbf k$, and $\mathbf k$ to $\mathbf i$. Applying invariance under this matrix to our equation $\mathbf i \times \mathbf j=\mathbf k$ gives us $\mathbf j \times \mathbf k=\mathbf i$. Applying it again gives $\mathbf k \times \mathbf i=\mathbf j$.

There's also a rotation (of $180^\circ$ about $\mathbf i+\mathbf j$) that takes $\mathbf i$ to $\mathbf j$, $\mathbf j$ to $\mathbf i$, and $\mathbf k$ to $-\mathbf k$. Applying invariance under this matrix to our equation $\mathbf i \times \mathbf j=\mathbf k$ gives us $\mathbf j \times \mathbf i=-\mathbf k$. Similarly we have $\mathbf k \times \mathbf j=-\mathbf i$ and $\mathbf i \times \mathbf k=-\mathbf j$.

Finally we want to know what $\mathbf i \times \mathbf i$ is. Let $M$ be the $180^\circ$ rotation about the $\mathbf k$-axis, as before. Then

$$M(\mathbf i \times \mathbf i)=(M\mathbf i)\times(M\mathbf i)=(-\mathbf i)\times(-\mathbf i)=\mathbf i\times\mathbf i$$

so $\mathbf i \times \mathbf i$ is fixed by $M$ and therefore lies on the $\mathbf k$-axis. But the same argument applied with a rotation about the $\mathbf j$-axis shows that $\mathbf i \times \mathbf i$ lies on the $\mathbf j$-axis too! These two axes only intersect at $\mathbf 0$. So $\mathbf i \times \mathbf i=\mathbf 0$ and by the same argument $\mathbf j \times \mathbf j=\mathbf 0$ and $\mathbf k \times \mathbf k=\mathbf 0$.

Now since we know how to cross product any two basis vectors we can calculate the cross product of any two vectors by multiplying out (using bilinearity):

$$\begin{align*}(u_i\mathbf i+u_j\mathbf j+u_k\mathbf k)\times(v_i\mathbf i+v_j\mathbf j+v_k\mathbf k)= &u_iv_i\mathbf i\times\mathbf i+u_iv_j\mathbf i\times\mathbf j+u_iv_k\mathbf i\times\mathbf k\\ +&u_jv_i\mathbf j\times\mathbf i+u_jv_j\mathbf j\times\mathbf j+u_jv_k\mathbf j\times\mathbf k\\ +&u_kv_i\mathbf k\times\mathbf i+u_kv_j\mathbf k\times\mathbf j+u_kv_k\mathbf k\times\mathbf k\\ =(u_jv_k-u_kv_j)\mathbf i+(u_kv_i-u_iv_k)&\mathbf j+(u_iv_j-u_jv_i)\mathbf k \end{align*}$$

This is the formula for the cross product.


The above was a rewrite of my original answer which said more or less the same thing as above but in more formal terms. I'll put my original answer here because I think some people reading this might like to see the technical details:

Given a $3$-dimensional oriented real inner-product space $V$ the group of symmetries preserving the inner-product and orientation is $\mathrm{SO}(V)$. The invariant tensors under $\mathrm{SO}(V)$ are $\delta_{ij}$, $\delta^{ij}$, and $\varepsilon_{ijk}$, along with the things they generate like $\delta^{ij}\varepsilon_{klm}$ and so on.

The tensors $\delta_{ij}$ and $\delta^{ij}$ are the inner-product and the inner-product induced on the dual space. These aren't very interesting because we defined $\mathrm{SO}(V)$ to preserve these, so we already knew that we were going to get them. But the tensor $\varepsilon_{ijk}$ is in some sense new. Therefore we are motivated to investigate $\varepsilon_{ijk}$ or equivalently the bilinear map $V\times V\rightarrow V$ given by $(v\times w)^i=\delta^{ij}\varepsilon_{jkl}v^kw^l$. This is the cross product.


Here is how I usually introduce the cross product.

Given two vectors $u=(u_1,u_2,u_3), v=(v_1,v_2,v_3)$ we seek another vector $w=(x,y,z)$ which is perpendicular to both. This means

$$u \cdot w= 0 \Rightarrow u_1x+u_2y+u_3z=0 \\ v \cdot w =0 \Rightarrow v_1x+v_2y+v_3z=0 $$

Now, all you have to do is solve this system of equations.Under the extra assumption that $u,v$ are linearly independent, the solution is exactly $t (u \times v)$.

One neat way to see this, is to start assuming that $u_1v_2-u_2v_1 \neq 0$ (or similarly for any pair of indices, which happens at least once for linearly independent vectors), and solve $$u_1x+u_2y=-u_3z \\ v_1x+v_2y=-v_3z $$ using Cramer's rule. The formulas for the cross product pop out immediately.


To me, the wedge product is more intuitive than the cross product. And as it turns out, geometric algebra provides exactly the way to get a vector perpendicular to the bivector $u\wedge v$: taking the algebraic dual. So I define the cross product as $$\bbox[5px,border:2px solid red]{u\times v := (u\wedge v)I^{-1}}$$ where $I$ is the positively oriented unit pseudoscalar of $\Bbb R^3$ (with the usual orientation).

enter image description here

The great thing about this definition is that it immediately tells us that the norm of the cross product is $\|u\|\|v\|\sin(\theta)$ and that it is orthogonal to both $u$ and $v$ (from the properties of the wedge product and the algebraic dual, respectively). So essentially we have to do the derivation you asked for in reverse. We'd like to show that this definition produces the correct components of the cross product. Here's a (pedantically) step-by-step derivation:

$$\begin{align}u\wedge v &= (u_1e_1 + u_2e_2+u_3e_e)\wedge (v_1e_1+v_2e_2+v_3e_3) \\ &= u_1v_1(e_1\wedge e_1) + u_1v_2(e_1\wedge e_2) + u_1v_3(e_1\wedge e_3) + \cdots \\ &= 0 + u_1v_2(e_1\wedge e_2) - u_1v_3(e_3\wedge e_1) + \cdots \\ &= (u_2v_3-u_3v_2)e_2\wedge e_3 + (u_3v_1-u_1v_3)e_3\wedge e_1 + (u_1v_2-u_2v_1)e_1\wedge e_2\end{align}$$

$$\begin{align}(u\wedge v)I^{-1} &= \big[(u_2v_3-u_3v_2)e_2\wedge e_3 + (u_3v_1-u_1v_3)e_3\wedge e_1 + (u_1v_2-u_2v_1)e_1\wedge e_2\big](e_3\wedge e_2\wedge e_1) \\ &= \big[(u_2v_3-u_3v_2)e_2e_3 + (u_3v_1-u_1v_3)e_3e_1 + (u_1v_2-u_2v_1)e_1e_2\big](e_3e_2e_1) \\ &= (u_2v_3-u_3v_2)e_2e_3e_3e_2e_1 + (u_3v_1-u_1v_3)e_3e_1e_3e_2e_1 + (u_1v_2-u_2v_1)e_1e_2e_3e_2e_1 \\ &= (u_2v_3-u_3v_2)e_1 + (-1)^2(u_3v_1-u_1v_3)e_3e_3e_2e_1e_1 + (-1)^4(u_1v_2-u_2v_1)e_1e_1e_2e_2e_3 \\ &= (u_2v_3-u_3v_2)e_1 + (u_3v_1-u_1v_3)e_2 + (u_1v_2-u_2v_1)e_3\end{align}$$

as we expected.


Note: The above seems to skirt the issue by simply replacing the cross product with another: the wedge product. But I'd argue any day that the wedge product is very well motivated, intuitive, and way more useful than the cross product. So while this definition does require a student to spend a little longer learning about the wedge product and dual than if (s)he just learned the cross product, the result should be a much better understanding of the geometric aspects of linear algebra.


$w \cdot (u \times v)$ can be drawn visually in 3D and you can give a visual proof that it equals the volume of a parallelopiped with sides $w,u,v$. Let's represent the set of all parallelopipeds as $P = \{(w,u,v) : w,u,v \in \Bbb{R}^3\}$. Volume is a map $\nu$ sending $P$ to $\Bbb{R}$ such that $\nu(w,u,v) = 0 \iff w,u,v$ are not linearly independent. Prove that $w\cdot (u\times v)$ follows the same exact rule and so does $\mu(w, u,v) = \det \begin{pmatrix} w_x & w_y & w_z \\ u_x & u_y & u_z \\ v_x & v_y & v_z\end{pmatrix}$, a result from linear algebra. Somehow using other properties of $\det$, show that $\nu$ must be equal to $\mu$. I leave the rest of the derivation to you.


The geometric property of the cross product can be derived from Lagrange's identity. You may not prefer this route as not being that "elegant", but (1) combined with N.S.'s answer this completes a full derivation of the cross-product from scratch and (2) the geometric property of the dot product is also derived in a way that involves a lot of algebra along with a special identity (the simplest derivation I know of follows from the Law of Cosines).

\begin{align} |u \times v|^2 &= (|u||v|)^2-(u \boldsymbol{\cdot}v)^2 \\ &= (|u||v|)^2 - (|u||v|\cos(\theta))^2\\ &= (|u||v|)^2 - (|u||v|)^2\cos(\theta)^2\\ &= (|u||v|)^2(1-\cos^2(\theta))\\ &= (|u||v|)^2\sin^2(\theta)\\ |u \times v| &= (|u||v|)\sin(\theta). \end{align}

If you want to know how to derive Lagrange's Identity, simply write out all of the terms and notice the simple cancelation pattern. With regards to the now deleted user's answer, Lagrange's identity can also be formulated in terms of the wedge product.

Also, yes, there is a narrative which brings the cross product into existence, and it is very interesting. The development of quaternions occurred in tandem with developments in physics towards the end of the 19th century. As far as your question is concerned, it should be said from here that the cross product is partly a physically defined operator more than a purely mathematically defined construction. Yes, Hamilton discovered quaternions by rigorously searching for a number system which held certain pure mathematical properties. However, as far as I know, it was only those practical results in physics, particularly within the fields of electromagnetism and mechanics, which led to the common practice of separating the real and complex parts of quaternions. It was here, as you probably already know, that vector algebra left quaternions behind because it became apparent just how practical it could be to construct separate operations which yielded the real and scalar parts of multiplied quaternions. Through sheer luck and physical experiment, these operations were found to yield more yieldy equations than before thought possible. If you want to know what I really mean when I say this, I suggest you read into Maxwell's equations in both forms (quaternionic and modern) as well as some of the published works of Heaviside, Clifford, and Gibbs. And if you're like me and what you've really been searching for now is some direct algebraic "jump" from quaternions immediately into vector cross products, you'll be sorry to hear that such a mathematical technique or formula does not and cannot exist so succinctly.

Josiah Willard Gibbs, in Elements of Vector Analysis, says: "The direction of α✕β may also be defined as that in which an ordinary screw advances as it turns to as to carry α towards β. Again, if α be directed toward the east, and β lie in the same horizontal plane and on the north side of α, α✕β will be directed upwards."

From Gibbs, we get a very dry, mere definition of the cross product operation. Yet where does he get it from? And how about the Right Hand Rule? As it seems to me, it was just built into the very definition of the cross product for the physics work of the time. (In following N.S.' answer, my final equations had ±'s in them, suggesting two possible answers and a thus "freedom of interpretation" wherein this rule is able to enter the calculus more as a definition). However, only from this motivated, pragmatic, physics-and-quaternions-based definition first can we move on to N.S.' answer and then on to Lagrange's Identity to derive the geometric properties and exact formula. The definition seems to have come out of the quaternion math of the time first, then all those perpendicular/parallelogram/etc. properties proven rigorously later.

If this answer wasn't satisfying enough for you and you would like to really dig deep and trace the trail between quaternions and cross products in its full entirety, here's a great "jump"-ing off point.