Why is the dot product of two vectors defined the way in which it is? [duplicate]
Most people who’ve sat through any lesson involving vectors will know about the vector dot product
If $\displaystyle \mathbf p=\left[{v_1\atop v_2}\right]$ and $\displaystyle \mathbf q=\left[{w_1\atop w_2}\right]$, then $$\mathbf p \cdot \mathbf q=v_1w_1+v_2w_2$$
Obviously this is the special case where the vectors lie in the two dimensional plane, and this formula does extend to $n$ dimensions, my typesetting abilities just aren’t advanced enough to know how to represent such vectors. In any case, I’m just wondering, so that I don’t go through the rest of any future linear algebra courses blind, why do we define the dot product of vectors in this way? Instead of, for example, multiplying each component of one vector by the corresponding component of a second vector to produce a third vector, whose components are these products?
Any input is appreciated, thank you.
Tl;dr
In my opinion, the dot product cannot be motivated naturally, because no single of its applications justifies this exact definition. However, the mere number of naturally occurring formulas which contain one or more terms of the form $v_1\cdot w_1+v_2\cdot w_2$ gives a hard to argue a posteriori motivation for this exact notation.
So the reason why the dot product is defined this way and no other: because this is the term which occures in hundreds of naturally emerging formulas, and no other.
The nature of definitions
In contrast to mathematical proofs or ideas for how to solve certain problems, which must be developed from the first second on, many definitions are given a posteriori, i.e. after the subject reached some maturity. The reason is that only after solving many similar problems, it turns out which definitions would have been useful in the first place. Many definitions arise for one of the following reasons:
- A certain important term is long, ugly or hard to remember. Therefore we introduce a short hand form to hide some complexity.
- A certain term occurs over and over, and it seems introducing a short-hand form creates some useful abstraction and might reveal what is really going on.
Another reasons for definitions, which also makes sense a priori, is the following:
- We know what we want to compute, but we lack the exact expression $-$ for now. Still, we have to develop a whole lot of theory until we have a result. Therefore we introduce a placeholder term. This is often done for quantities occurring from modeling reality, e.g. curve lengths etc.
The dot product is a classic example of the second motivation (among others like determinants, matrix multiplication, ...). Look at the following problems and their solutions. I will not show you how to derive them as this will be done as you advance in linear algebra (or you already know them):
- Do you want to compute the length of a vector $\mathbf v=(v_1,v_2)$? Do it like this: $$\sqrt{v_1\cdot v_1+v_2\cdot v_2}.$$
- Do you want to know the angle $\alpha$ between two vectors $\mathbf v=(v_1,v_2)$ and $\mathbf w=(w_1,w_2)$? Do it like this:$$\cos(\alpha)=\frac{v_1\cdot w_1+v_2\cdot w_2}{\sqrt{v_1\cdot v_1+v_2\cdot v_2}\sqrt{w_1\cdot w_1+w_2\cdot w_2}}.$$
- Do you need to project a vector $\mathbf v=(v_1,v_2)$ onto a plane with normal vector $\mathbf n=(n_1,n_2)$? Do it like this: $$\mathbf v-\frac{v_1\cdot n_1+v_2\cdot n_2}{n_1\cdot n_1+n_2\cdot n_2}\mathbf n.$$
- Do you need to know if two vectors $\mathbf v=(v_1,v_2)$ and $\mathbf w=(w_1,w_2)$ are orthogonal? Check whether $$v_1\cdot w_1+v_2\cdot w_2=0.$$
All these problems arise naturally in a geometrically motivated subject like linear algebra. And do you see what all of them have in common? They all can benefit from the definition
$$\mathbf v\cdot \mathbf w := v_1\cdot w_1+v_2\cdot w_2.$$
All the complexity vanishes and we get (in this order):
$$\sqrt{\mathbf v\cdot \mathbf v},\qquad \cos(\alpha)=\frac{\mathbf v\cdot \mathbf w}{\sqrt{\mathbf v\cdot\mathbf v}\sqrt{\mathbf w\cdot\mathbf w}},\qquad \mathbf v-\frac{\mathbf v\cdot\mathbf n}{\mathbf n\cdot\mathbf n}\mathbf n,\qquad\mathbf v\cdot\mathbf w=0.$$
Further simplification can be obtained via the definition $\|\mathbf v\|=\sqrt{\mathbf v\cdot\mathbf v}$ after it is proven that $\mathbf v\cdot\mathbf v\ge0$. Also, this definition opens up the way for a coordinate-free approach to linear algebra which only then justifies the word algebra in the name.
From a didactic point of view
I generally avoid introducing definitions without some motivation. For very central and recurring elements like the dot product it is hard to demonstrate the true importance before bringing the definition $-$ already for notational reasons.
But what can be done is computing at least two of the above toy problems and therefore demonstrating the recurrent character of this element in naturally occuring tasks.
Only after this definition has proven its usefulness in certain relevant problems it is appropriate to give definitions which are more like theorems:
- This definition is the only way to define a bi-linear multiplication on vectors that yields scalar and also gives $\mathbf e_1\cdot\mathbf e_1=1$ and $\mathbf e_2\cdot\mathbf e_2=1$ for $\mathbf e_1=(1,0)$ and $\mathbf e_2=(0,1)$.
or which are mainly based on further unmotivated axioms:
- A dot product is a symmetric, positive definite bilinear-form.
There can be many answers to your question but let me explain my thoughts.
First, as you have noticed the inner product isn't really a product in the usual sense , since the output is not again an element of $\mathbb{R^n}$ but rather just a real number (and that's why it is called "inner"). Now, in a really down to earth sense , the inner product is tool to measure angles. In fact the formal way to define the angle of two elements of $\mathbb{R^n}$ is $θ=\cos^{-1}(\frac{<υ,ν>}{<υ,υ>\cdot<v,v>})$.
Why are we defining the inner product that way? Taking cue from the properites of the usual , Euclidian inner product, we say that any billiniar form in a vector space ( a function that takes 2 vectors and spits a number) is called an inner product if it satisfies certain axioms (see wikipidia's article). One can see that in Euclidian Spaces all inner products stem from the standart one with an appropriate change of basis. So that answers why we define the opperation like that.
What good is the inner product for? The parallelogram law of course! $||x||^2+||y||^2=\frac{||x+y||^2+||x-y||^2}{2}$ which is easily verified using that $||x||^2=<x,x>$ ( it is a hard and surprising theorem that if you have a normed space with a parallelogram law you also have an inner product!)This property is not only a generalisation of good old Pythagora's Theroem but it is really the essential tool to prove geometric properties of linear spaces. If you search about Hilber Spaces and Banach Spaces you will come to realise how usefull this is.
Finally, why not define multiplication pointwise ? Simply, because it is useless! Pointwise multiplication really has no good properties and I haven't seen any good usage of it. It is really hard to define multiplication between vectors of $\mathbb{R^n}$ that returns another vector of $R^n$ and in fact it is really surprising (and deep) theorem that you cannot always do it ( indeed such a multiplication exists only in $\mathbb{R}$,$\mathbb{R^2}$,$\mathbb{R^4}$ and $\mathbb{R^8}$ but it gets progressively "weaker". Wikipidias article on Quartenions&Octonions).
TL;DR : We want a function that measures angles and especially orthogonality and this definition is the simplest we can get.
The law of cosines in elementary Euclidean geometry gives a formula relating angles and lengths. When interpreted with coordinates and Pythagoras theorem is applied, isolating the angle componenet gives rise to the Cauchy-Schwarz inequality involving the standard inner product. This is why the standard inner product is defined the way it is.
Here is a physicist point of view. A dot product is a projection of one vector onto another. Define each vector in polar coordinates p=p (Cos($\theta_p$),Sin($\theta_p$)) and q=q (Cos($\theta_q$),Sin($\theta_q$)). The dot product gives you p.q=pq (Cos($\theta_p$).Cos($\theta_q$)+Sin($\theta_p$),Sin($\theta_q$))=pq Cos($\theta_p$-$\theta_q$), which depends on the difference angle between the 2 vectors. With a little bit of geometry:
You can see the dot product is the component of vector p along q. In fact, this formula can be generalized to any dimension because you can always define vectors in polar coordinates.