Just a remark concerning 'problem 2':

The coordinates of the cross product $\bf{a}\times\bf{b}$ are the determinants of the projections of $\bf{a}$ and $\bf{b}$ onto the coordinate planes. So the $x$-coordinate of $\bf{a}\times\bf{b}$ is the area of the parallelogram spanned by the projections of $\bf{a}$ and $\bf{b}$ onto the $yz$-plane. I hope this helps your intuition a bit.


Let me take a stab at the intuition behind the definition of the cross product.

First, note that cross product is a construct that is specific to three dimensions (it can technically be defined in seven dimensions too, but there, this construction is not unique). Let's first see what is special about three dimensions. The idea is to define a binary operator that takes two vectors and produces another one. Two (non-parallel) vectors define a plane, and only in three dimensions, the direction and orientation of a $2$D-plane can be uniquely determined by a vector perpendicular to it. That is, in $3$D there is a natural map between planes through the origin (defined by pairs of non-parallel vectors) and vectors up to a scalar multiple. In higher dimensions, for every $2$D-plane, there are multiple non-parallel vectors that are perpendicular to the plane, and for every vector, there are multiple planes passing through the origin that are perpendicular to that vector. Therefore, there is no natural (unique up scaler multiple) mapping between planes through the origin (read pair of vectors) and vectors higher dimensions.

This duality between planes and vectors perpendicular to them in $3$D allows us to identify otherwise-more-complicated-mathematical-objects with vectors (which we know have nice properties), and that is the main motivation behind defining cross product. There are two examples of such objects that I think best motivate the definition of cross product (1) surface area and (2) rotation. The first one is mentioned by the OP and is the subject of the first question, and the second one is really the motivation behind the application cross product in the magnetic field which is again mentioned by the OP. Let us start with surface area:

Cross Product as Area Element

I am going to motivate this using the following example. Suppose you have a continuous fluid moving with some velocity field $\vec v(\vec r)$ in $3$D space. And suppose you have a surface area (possibly curved) and you want to know how much fluid is going through this surface per unit time. Obviously, if the velocity field is completely parallel to the surface, the fluid is not going through it, so it is not hard to convince yourself (draw a picture) that the flux density of the fluid through the surface is given by the component of $\vec v$ perpendicular to the surface. So the total flux would be given by the integral of the perpendicular component of $\vec v$ on the surface.

Since the direction of the surface everywhere can be determined by its normal vector $\hat n$, we can find the net flow rate through the surface as $$ \text{Flow rate}=\int_{\text{surface}}\vec v \cdot \hat n \,dS $$

Let us parametrize this surface with two parameters $s$ and $t$ such that all the points on the surface can be written as $\vec r=(x(s,t),y(s,t),z(s,t))$. At every point with parameters $(s,t)$ on the surface, the two vectors $\partial \vec r/\partial s$ and $\partial\vec r/\partial t$ are tangent to the surface. Then the normal vector to this surface is parallel to the cross product of these two vectors. Fortunately for us, the area element $dS$ is also proportional to the length of the cross product so that we can write the combination $\hat n\, dS$ as $$ \hat n\,dS = \frac{\partial \vec r}{\partial s}\times \frac{\partial \vec r}{\partial t} dr\,ds. $$ This allows us to reduce the surface integral to ordinary double integral on the coordinates $s$ and $t$: $$ \text{Flow rate}=\int_{\text{surface}}\vec v \cdot \hat n \,dS = \int\int\vec v(s,t) \cdot \left(\frac{\partial \vec r}{\partial s}\times \frac{\partial \vec r}{\partial t} \right)\,ds\,dt. $$ What made this problem easy is that the area element which is defined by the pair of vectors $(\partial \vec r/\partial s,\, \partial\vec r/\partial t)$ can be identified by a vector such that when we need to find the component of $\vec v$ normal to the surface, we simply took the dot product of $\vec v$ by the area element. If we did not have the luxury of identifying the area element as a single vector, we would have to manually find the component of $\vec v$ that is perpendicular to both vectors defining the surface.

Note that this area element is the promised otherwise-more-complicated-mathematical-object that only in $3$D can be identified by a vector. In higher dimensions, this beast is known as the wedge product of the two vectors defining it and cannot be reduced to a single vector.

Cross Product as Rate of Rotation

We tend to think of proper rotations in $3$D in terms of an axis of rotation and an angle. The concept of the axis of rotation does not generalize to higher dimensions. What does generalize is the plane perpendicular to it. This plane perpendicular to the axis of rotation is invariant (it gets mapped to itself) under rotation. Higher-dimensional proper rotations can be identified in terms of the set of their invariant planes and the angles of rotations associated with each plane. It is the duality between planes and axes (map between pair of vectors and single vector perpendicular to them) that allows us to talk about axis of rotation in $3$D.

Imagine rotating a vector $\vec v$ at the rate of $\omega$ radians per second around an axis of rotation defined by the vector $\hat u$ perpendicular to some plane $P$. Since $P$ is the rotation plane, if we decompose $\vec v$ to its components $\vec v_\perp$ and $\vec v_{||}$, perpendicular and parallel to the plane $P$, $\vec v_\perp$ will not change under the rotation, while $\vec v_{||}$ stays in $P$ and rotates at the rate $\omega$. So the rate of change of $\vec v$ is given by the rate of change of $\vec v_{||}$. Since $\vec v_{||}$ is now in a $2$D plane, we can rewrite its rate of change in polar coordinates $$ \frac{d\vec v}{dt} = \frac{d\vec v_{||}}{dt} = \frac{d\left|\vec v_{||}\right|}{dt}\hat r+\left|\vec v_{||}\right| \omega \hat\theta = \left|\vec v_{||}\right| \omega \hat\theta $$ Here I used $d\vec r/dt = \dot r\hat r+r\dot\theta\hat\theta$ (see here) and the fact that rotation does not change the length of a vector. Without using the axis of rotation $\hat u$ it would be hard to find both $\left|\vec v_{||}\right|$ and the direction of $\hat\theta$ (which is on the plane $P$ and perpendicular to $\vec v_{||}$. But note that $\hat\theta$ is in the plane $P$, that is it is also perpendicular to $\vec v_\perp$ which makes it perpendicular to $\vec v$. Additionally, it is perpendicular to $\hat u$ since $\hat u$ is perpendicular to $P$. And since we know somthing that is perpendicular to two vectors is parallel to their cross product, $\hat \theta$ is in the direction of $\hat u\times \vec v$. Also $\left|\vec v_{||}\right|$ is $\left|\vec v\right| \sin(\phi)$ where $\phi$ is the angle between $\vec v$ and $\hat u$, so we can write the rate of change of $\vec v$ as $$ \frac{d\vec v}{dt} =\omega\, \hat u\times \vec v $$ Again, since we have a direction $\hat u$ and a scaler $\omega$ for the rotation, we can define a vector $\vec \omega = \omega \hat u$ as a rotation vector (really angular velocity vector) and say $$ \frac{d\vec v}{dt} =\vec \omega\times \vec v. $$ So if we identify rotations by vectors representing their axis of rotation and rate of rotation (rate of change of angle), the rate of change of any vector $\vec v$ under such rotation is given by the cross product of these two vectors. In other words, cross product is the so called infinitesimal generator of rotaions.

At this point, you should not be surprised that the vector $ \vec \omega$ itself is the cross product of two vectors in the plane $P$. You can take any unit vector in $P$ and look at its cross product with its rate of change and you will get $\vec\omega$. I leave that as an exercise.

Magnetic Field as Axis of Rotation

Electric field is a very intuitive concept compare to the magnetic field. The magnitude of electric field is the force per unit charge, and its direction is the direction of that force. So the electric field simply forces per unit charge. On the other hand, the magnetic field is a vector that if you take its cross product with $\vec v$ (the velocity of a moving test charge) it would give you a force per unit charge. It does not seem to point to a physically meaningful direction in the space. Moreover, if you think about how it is created, it is created through another moving charge (this is not always strictly true, but a moving charge does create a magnetic field). The magnetic field created by a moving charge with velocity $\vec u$ at position $\vec r$ is proportional to $\vec r \times \vec u$. That gives a magnetic force on another moving test charge that is in the direction of $\vec v \times (\vec r \times \vec u)$ which is back in the plane defined by $\vec r$ and $\vec u$.

It seems like that the direction of magnetic field which points outside of the plane defined by $\vec r$ and $\vec u$ where all the action is happening is not really pointing at some physically meaningful direction in the space. It is a mear mathematical convenience to define it in that direction. But I encourage you to go back and read the previous section about the rotation to see the similarities.

A moving charge at position $\vec r$ with velocity $\vec u$ creates a rotating force at the origin attempting to rotate the velocity vector of any charge at the origin in the $\vec r$-$\vec u$ plane. But since in $3$D we can identify rotations with their axis instead of their plane, which we love to do because vectors are simpler mathematical objects with known rules than planes, we define this force in terms of a vector $\vec B$ that points outside of this plane. That is the magnetic field. And since the rate of rotation is given by the cross product of the rotation vector by the vector, the rate of change of the velocity $\vec v$ of a test charge at the origin is given by $\vec B\times \vec v$.

Now let me attempt to actually answer OP's questions

1) Why wasn't the cross product defined as just this magnitude? Was the orthogonal vector just some convenient form of killing two birds with a stone (getting both the measure of perpendicularity and getting the normal vector to the plane spanned by $a$ and $b$)?

As I mentioned already, the cross product, whether it represents area element or rotation, it is a property of two vectors, and you need the direction of the plane defined by them to have a full description of the mathematical object you are describing. In the case of area, we needed them that to find the component of velocity perpendicular to the surface, and in the case of rotation, we obviously need to know the axis of rotation. It is easier to work with one vector than it is to work with the two original vectors because the rules of vector algebra make it easier to calculate things in terms of coordinates, like the dot product in area calculation, or finding the rate of change of rotating vector instead of using a full rotation matrix.

2) Is there any intuition that the components of cross product $a \times b$ are:
$ \langle(a_y b_z - a_z b_y), (a_z b_x - a_x b_z), (a_x b_y - a_y b_x) \rangle$?

Let us think of $a\times b$ as a rotation of $b$ by $a$. Since the cross product is linear, we separate this as the sum of three independent rotation of $b$ around components of $a$. Let us look at the component $a_z$. That is a rotation around $z$ axis. The rate of change of $b$ due to rotation around $z$ axis is only affected by its projection on $x-y$ plane, i.e. only $b_x$ and $b_y$ components. Rotating the projection of $b$ on $x$ around $z$ creates components along $y$ direction while rotating projection of $b$ along $y$ around z creates components in $-x$ direction. So the rate of change of $b$ due to rotation at the rate $a_z$ around $z$ is given by $$ (-a_z\,b_y,\, a_z\, b_x,\,0). $$ Repeat the same logic for rotation around $x$ and $y$ at rates $a_x$ and $a_y$.


For 1; The reason why the cross product isn't just the magnitude, is that we simply want it to also be a vector with its direction perpendicular to the other 2 vectors in the plane, etc. Now, why? Well because in the example about the magnetic force, we found that (from observations/discoveries/equations etc...) the force's direction is perpendicular to v and B. But in the same time, the magnitude of this force is also VBSin(theta).

So to MODEL this phenomena, we define the cross product operation to be what it is, just like dot product and work, but in the case of work, it is useless to think of work as a vector, because its just a quantity that's conserved and that's all we care about. :) I hope that makes sense.

Oh, and, about the components, I tried to derive the formula for them based on the criteria that the vector must be perp. to the other 2 and has this magnitude etc, I got pretty close and I'm working on it. I'm in fact thinking about posting a question on this.


Matt L.'s answer is the one I would give but it could use some elaboration.

First, get comfortable with the fact that a n-D determinant is the signed area of a n-D parallelepiped. We're concerned with cross products in 3-D. This entails 2-D determinants -- areas of parallelograms. Here's a good video on this.

Now, what's the significance of the projections of the areas formed by two vectors onto coordinate planes?

Before even worrying about the cross product or areas formed by two vectors, let's motivate the idea of projecting area by thinking about tops:

enter image description here

(ignore the shadow in this picture. The light source is not straight vertically down, so it doesn't fit into the analogy)

The reason tops come to mind is because it has everything we need:

  1. area. The area of the "circle" part of the top. We don't really care what the area actually is and we certainly don't care that it involves $\pi$. All we care about is that it's some amount of area that lives in its own flat plane.
  2. a "vector" orthogonal to that area. The handle part sticking out of the circle is orthogonal to the plane that the circle lies in.

Let's say we start out with the top upright. So the circle lies in the $x,y$ plane, and the handle points along the $z$ axis.

Let's say the area of the top's circle is $A$. If we shine an idealized flashlight straight down onto the top, the area of the shadow on the table underneath the top should be $A$, Right? The light rays are shining down along the $z$ axis, and striking the table (flat, parallel to $x,y$ plane), except for those rays blocked by the top's circle.

Then, what if we tilt the top in an arbitrary direction? Its physical area $A$ will not change, but the area of the shadow will now decrease, right? Let's call the area of its shadow after tilting $S_{tilted}$.

The key insight is that $\frac{S_{tilted}}{A} = cos(\theta)$, where $\theta$ is the angle between the top's surface and the $x, y$ plane, which is also the angle between the top's handle and the $z$ axis.

That is, as we tilt our top, the handle forms a widening angle with the $z$ axis, and since the handle is rigidly orthogonal to the circle, the circle (the circle's plane) is simultaneously forming that same angle with the $x,y$ plane.

The same thing holds for the other coordinate planes. We could rename $S_{tilted}$ to $S_{table}$, and then also measure $S_{western-wall}$, and $S_{southern-wall}$.

$S_{western-wall}$ is the area of the shadow we get by shining the flashlight "down" the $x$ axis and making a shadow on our western wall, the $y, z$ plane. Likewise for $S_{southern-wall}$.

We can use only these shadow areas and no knowledge of the top's handle to produce a vector that points in the same direction as the top's handle. This will be analogous to using the projections of the area formed by two vectors to get a third vector orthogonal to both of them -- i.e., the cross product.

So if we put these shadow measurements into a vector, we could have

$$ \begin{align} \left( \begin{matrix} S_{western-wall} \\ S_{southern-wall} \\ S_{table} \end{matrix} \right) & = A \cdot \left( \begin{matrix} \text{ cosine of angle between top and y, z plane}\\ \text{ cosine of angle between top and z, x plane}\\ \text{ cosine of angle between top and x, y plane}\\ \end{matrix} \right) \\ \\ & = A \cdot \left( \begin{matrix} \text{ cosine of angle between handle and x axis}\\ \text{ cosine of angle between handle and y axis}\\ \text{ cosine of angle between handle and z axis}\\ \end{matrix} \right) \\ \\ & = A \cdot \text{direction of handle} \end{align} $$

specificially, $A$ times a unit vector pointing in the direction of the handle. Because knowing the angle made between the handle and each of the axes tells us the the direction of the handle.

So all we did was use a flashlight to measure the various shadows (projections), and this gave us a vector pointing in the direction of the handle, with length $A$ (roughly the "cross product", the thing we really wanted). Notice we never measured $A$ directly, and we never measured anything about the handle directly. We only used the knowledge that the handle is orthogonal to the top's surface.

Remaining things we need to figure out

  1. What exactly is the angle "between two planes"? Why does cosine/trig still apply here like it does with angles between lines?
  2. This was all done with the area of a circle. What about the area of a parallelogram formed with our two vectors we want a cross product of?
  3. How does the formula for a cross product give us the areas of the shadows (projections)?

Imagine an opened door. The door forms an angle with the wall / doorway. Imagine if the door and wall was painted with thin horizontal stripes. Each stripe "gets its own angle" between the door and the wall when you open the door. If you're facing the wall when the door opens, the line segments on the door will be "visually scaled" by a factor of $cos(\theta)$. When the door is fully open to 90 degrees, you won't see the line segments on the door at all ($cos(90) = 0$).

If you outlined a circle on the door, the area of that circle is made up of lots of thin horizontal stripes. Each stripe gets scaled by the same amount -- $cos(\theta)$. Therefore, the entire area of the circle is scaled by this factor.

But it doesn't really matter that you drew a circle, does it? This would apply to any drawing. It could be a drawing of a star shape, or it could be a drawing of a parallelogram formed with two vectors. All area in the door plane is scaled by the same factor as you watch the door open.

Any two planes are equivalent to a wall and a door. They just might not be aligned with gravity. The line of intersection between the two planes is where the door's hinges and the "crack" of the door are.

You could also think about starting with the top in its original $x, y$ plane position, grabbing onto the tip of the handle, and tilting the top. As you tilt the top, the tip of the handle traces out a section of a circle in the air between its original $z$ axes aligned position, and its new position upon tilting. You could paint the top's surface with lots of thin lines that are parallel to that circle of tilt. Each of these lines moves away from its original position inside the $x, y$ plane with the same angle that the handle is being tilted from the $z$ axis.

That sort of addresses 1 and 2, which leaves 3.

If you have a parallelogram formed with two vectors $v$ and $w$, how do you get its projection on the $x, y$ plane? You delete the $z$ coordinates of $v$ and $w$ (set them to 0). Now you have two 2-D vetors in the $x, y$ plane. If you look closely at the cross product formula, you'll see that you're using the determinant of these 2-D vectors to get the third coordinate of the cross product. That's the area of the shadow on the table, $S_{table}$.

This hasn't covered the matter of orientation at all. That could be covered elsewhere. A small hint would be... notice I've referred to the $z, x$ plane as the "southern wall" one. Why not the $x, z$ plane? In a sense, $x, z$ is out of order.