How do I think intuitively about the properties of inner products?

I'm currently self-studying linear algebra and I'd like an intuitive way to think about why the properties for an inner product are what they are.

Metric and norm properties make sense to me when I think about them as mathematical representations of 'distance' and 'size' respectively.

  • e.g. the distance between any two points of a set should only be 0 if they're the same point

But I can't find any such intuition for properties of inner products like

$$\langle u+v, w\rangle=\langle u, w\rangle+\langle v, w\rangle, \quad \forall u, v, w \in V$$

In the book "Linear Algebra Done Right", the author talks about the properties of the Dot Product in Euclidean Space as motivation for the properties of inner products in general. But I assume the Dot Product has many properties so why just define inner products with the following four properties out of all the properties mathematicians could have chosen?

  1. additivity in first argument
  2. homogeneity in first argument
  3. conjugate symmetry
  4. positive-definiteness

Short answer:

An inner product is something which satisfies the Cauchy-Schwarz inequality. This inequality guarantees that in any inner product space one can talk about lengths, angles, and (orthogonal) projections, analogously to our intuition in $2$- and $3$-dimensional Euclidean geometry. The four properties you list are (almost) the bare minimum one can distill from a dot product so that the resulting product still satisfies Cauchy-Schwarz, hence behaves according to our geometric intuition.

Long answer:

I will restrict to the real case here. Once one gets a grip on that, it is a different question why for the ground field $\mathbb C$, one makes slight adjustments involving complex conjugation: see e.g. answers to Why are inner products defined to be linear in the first argument only? and Why conjugate when switching order of inner product?, or the answer https://math.stackexchange.com/a/2699556/96384.

1. Lengths …

You say that you find the idea of metrics and norms quite intuitive, which is a great start. To recall, a norm (on a vector space) is a way to assign to every nonzero vector $v$ a positive length $\Vert v \Vert$, which "scales" in the way which should be intuitive ($\Vert a v \Vert = \lvert a \rvert \Vert v \Vert$, where $a$ is a real scalar) and so that $\Vert v + w \Vert \le \Vert v \Vert + \Vert w \Vert$. Once one demands this version of the triangle inequality, indeed $d(v,w) := \Vert v-w \Vert$ defines a distance (metric) on the vector space, where a vector's norm is just its distance to the origin.

2. … and angles!

Now, one can and does study normed vector spaces, i.e. vector spaces with an additional measure of lengths and distances well-behaved under addition and scaling, that's a rich theory in itself. But an inner product not only defines lengths and distances via such a norm, but it gives even more "geometric" content, namely, angles between vectors.

Indeed, if on our real vector space we have an inner product with the four properties you list, then, first of all, the Cauchy-Schwarz inequality guarantees that $\Vert v \Vert := \sqrt{\langle v, v \rangle}$ indeed does define a norm: it enforces the triangle inequality. [And in the case of the dot product on plain old $\mathbb R^3$ or the like, the norm arising in this way gives us, via Pythagoras, what we intuitively view as the lengths of vectors visualized as "arrows in space".]

But Cauchy-Schwarz gives us much more: It guarantees that for any two vectors $v \neq 0 \neq w$, the quantity

$$\dfrac{\langle v, w \rangle}{\Vert v \Vert \cdot \Vert w \Vert}$$

lies in the interval $[-1, 1]$; and further, this quantity is $1$ (or $-1$) exactly if $v$ is a positive (or negative) multiple of $w$. So, from the $2$- and $3$-dimensional cases which we can visualize, where the dot product gives out exactly what we make a definition now, we define the angle between $v$ and $w$ to be the $\arccos$ of that quantity,

$$\measuredangle (v,w) := \arccos \left( \dfrac{\langle v, w \rangle}{\Vert v \Vert \cdot \Vert w \Vert}\right)$$

So once you have an inner product, e.g. on certain spaces of integrable functions, then you have just defined angles between any two such functions!

I stress that all four properties go into making this well-defined. The two last ones also are "visible" in this interpretation: Property 4, positive-definiteness, ensures every nonzero vector has a positive length. Property 3, symmetry, is particularly plain in the angle interpretation, as the angle between two vectors is always the "(non-oriented) enclosed" angle, the one whose value is between $0$ and $\pi \sim 180°$, which is symmetric in $v$ and $w$. The particularly prominent case of this is that we declare two vectors to be orthogonal, $v \perp w$, if and only if $ \langle v, w \rangle =0$.

By the way, one can turn this around and write $$\langle v, w \rangle = \Vert v \Vert \cdot \Vert w \Vert \cdot \cos(\measuredangle(v,w))$$

So geometrically, the inner product of two vectors is the product of their lengths times the cosine of the angle between them. Of course, this "high school definition" of an inner product is cheating unless you already know (from some other definition) what the angle between two vectors is, which you might from Euclidean geometry in $\mathbb R^2$ or $\mathbb R^3$, but certainly not in a space like $L^2[0,1]$. If you want to define angles on otherwise hard to visualize spaces like that, then you should do it so that the above "cosine definition" of the inner product is true!

Anyway, this guarantees that the inner product of $v$ and $w$

  • is zero iff the vectors are perpendicular to each other,

  • is the product of their lengths iff they point in the same direction (are positive multiples of each other), and

  • is the negative product of their lengths if they point in opposite directions (are negative multiples of each other), and

  • is continuously well-behaved (via a trigonometric function) in the continuum of cases between those extremes.

Further: If both our vectors are unit vectors i.e. normed to length $\Vert v \Vert = \Vert w \Vert = 1$, then the previous equation reduces to

$$\langle v, w \rangle = \cos(\measuredangle(v,w)).$$

If you draw a $2D$ picture of this, with $v, w$ two vectors on the unit circle, and project one of them orthogonally to the other, and compare with the high school definition of cosine, you'll see that this naturally leads to the final intuitive viewpoint:

3. Projections

For the dot product on $\mathbb R^n$, we know that $\langle v, e_i \rangle$ gives the $i$-th coordinate of $v$. Or, we can describe the projection of $v$ to the $i$-th coordinate as the vector pointing in direction $e_i$ with length $\langle v, e_i \rangle$.

Now the final core idea of inner products is to vastly generalize this to:

If $w$ is a vector of length 1, then $\langle v, w \rangle$ is the length of the projection of $v$ onto (the line through) $w$.

Which, if vectors are visualized as arrows, should make you see this picture:

enter image description here

And after the lengths motivated property 4, and angles motivated property 3, this projection business should motivate the first two properties, as it is quite clear that

  • the projection of a scaled vector is just scaled accordingly, $\langle av, w\rangle = a\langle v, w\rangle$, and

  • the projection of a sum is the sum of the projections: $\langle v_1+v_2, w \rangle =\langle v_1, w \rangle + \langle v_2, w\rangle$.

NB: For this we assumed that $w$ has length $1$. Keeping such $w$ fixed, we then stretch $v$ to see that its projection on $w$ stretches with the same factor. But what if instead we stretch the other vector $w$? The projection of $v$ on $w$ is of the same length as the projection of $v$ on $aw$ for any $a \neq 0$. It is a nice little exercise to infer from this that for a vector $w$ of general positive length, the projection of $v$ on $w$ has length $\dfrac{\langle v,w \rangle}{\langle w,w \rangle}$.

And from this, it is just a tiny step to "orthonormal bases" via the Gram-Schmidt process. Meaning that we can choose "pairwise orthogonal coordinate axes", with "unit vectors" on them, which for all intents and purposes behave like the $x$-, $y$- and $z$-coordinates in the Euclidean space we imagine to inhabit.

4. Cauchy-Schwarz

The crucial mathematical fact, which makes all of this work in general and at the same time match our intuition in low dimensions, is the Cauchy-Schwarz inequality

$$\dfrac{\lvert \langle v, w \rangle\rvert }{\Vert v \Vert \cdot \Vert w \Vert} \le 1.$$

It is absolutely worthwhile to read about this, and to see how it connects to everything written above (good answers on this site are found at Intuition for the Cauchy-Schwarz inequality and A natural proof of the Cauchy-Schwarz inequality). Any proof of the Cauchy-Schwarz inequality needs all properties 1-3 and "almost" property 4, and ultimately that is the best reason for demanding them.

As regards property 4, to be extremely precise (Does Cauchy-Schwarz Inequality depend on positive definiteness?), actually we only need positive-semi-definiteness, i.e. we could get along with nonzero vectors satisfying $\langle v, v \rangle = 0$, although we would be forced to (not) view them as invisible vectors lurking behind some Euclidean space. However, as soon as we go one step further, to indefinite symmetric bilinear forms, then although they prove useful in different contexts, Cauchy-Schwarz and hence the whole Euclidean angles-and-projections intuition breaks down for them. Something which does not satisfy Cauchy-Schwarz cannot fully play the roles of anything I have written about above.

5. Further reading (i.e. more concise version of my blatherings)

  • What is a complex inner product space "really"?
  • What Is An Inner Product Space?
  • Intuitive Explanation of the Inner Product