Motive for the definition of inner product

Mathematicians pride themselves on writing proofs of propositions in an elegant way, but frequently (maybe even usually?) neglect to formally write motivations of definitions with the same elegance, efficiency, and sometimes beauty, and neglect to assign exercises in which the student is challenged to prove that a definition is the only (sometimes up to some logically equivalent formulations) one that satisfies some desiderata. Often a textbook will just say dogmatically "A left and right conjugate sub-hypopotamus is a thing that blah blah blah${}\,\ldots$" As I have remarked elsewhere in this forum, it is usually considered licit to define the concept of "group" by saying a group is a set with a binary operation satisfying etc. etc. etc. and then go on to prove a zillion theorems in group theory, rather than showing at the outset how the concept developed from a variety of concrete examples involving transformations in geometry, bijections, matrices, etc.

Definition: The inner product $\langle u,v\rangle$ of two vectors $u=(u_1,\ldots,u_n), v=(v_1,\ldots,v_n) \in\mathbb C^n$ is $$ \sum_{k=1}^n u_k \overline{{}\,v\,{}}_k $$ where $\overline{a}$ is the complex conjugate of $a\in\mathbb C$.

A friend once asked me why one takes conjugates. And perhaps why only of the components of the second vector and not the first.

My answer was that it makes $\|u\|^2$ in every case positive (except when $u=0$) and makes the Pythagorean theorem work on neatly so that some basic Euclidean geometry can be applied in these spaces. His reaction included a certain amount of indignation about not having been told that by textbooks and instructors. Perhaps this bears the same relation to a good publication-worthy answer that a vague handwaving argument does to a proper proof.

So, suppose that our customs in regard to writing motivations of definitions were like those of our customs in regard to writing proofs. We want them to be complete, correct, informative, satisfying to reasonable demands for justification, elegant, beautiful, comprehensible to the intended audience, and as simple as possible subject to the foregoing constraints. Particularly clever motivations would be published in things like the Monthly in the same way that novel or unusually nice new proofs of old theorems are now published. A particularly brilliant motivation for a new definition might be the whole of the topic of a paper in a research journal, maybe in rare cases winning a Fields medal.

So, how would one write a good publication-worthy motivation for the definition above in accordance with the standards outlined in the foregoing paragraph?


Hermitian metrics and real-valued metrics

The notion of a Hermitian metric arises naturally if one considers a complex vector space as its underlying real vector space with extra structure, and asks what the notion of a metric which "plays nicely" with that extra structure should be.

Complex vector spaces

Consider $V$ a complex vector space, i.e. it has a scalar multiplication $\mathbb{C} \times V \rightarrow V$ and satisfies the vector space properties. Then $V$ is also a real vector space (of twice the dimension, if it is finite dimensional) using the restriction of scalar multiplication $\mathbb{R} \times V \rightarrow V$.

The extra structure that $V$ has as a complex vector space is simply multiplication by $i$. That is, considering $V$ as a real vector space, we have a map $J : V \rightarrow V$ given by scalar multiplication by $i$. This map is linear and has $J^2 = -I$. (From this we can recover the structure of a complex vector space for $V$ as now we know how to scalar multiply by $a+bi$ by using the linear map $aI + bJ$.)

Introducing a real-valued metric

Now let's introduce a real-valued metric on $V$, which is a symmetric, positive-definite bilinear form $g: V \times V \rightarrow \mathbb{R}$ which is $\mathbb{R}$-linear.

How should $g$ interact with the extra structure of multiplication by $i$, which we've denoted $J : V \rightarrow V$?

The natural choice is to require $J$ to be an isometry with respect to $g$. That is, $g(Jv,Jw) = g(v,w)$. Why is this the natural choice?

  1. Multiplication by $i$ on the complex plane with the usual metric is rotation counterclockwise by $90^\circ$, an isometry.

  2. If we consider $\mathbb{C}^n$ as $\mathbb{R}^{2n}$ and put the standard metric on $\mathbb{R}^{2n}$ then multiplication by $i$ is an isometry: it's rotation by $90^\circ$ on each of the complex coordinate axes.

  3. With $J^2 = -I$, which is an isometry, requiring $J$ to be an isometry as well is a fairly natural choice for how $J$ and $g$ should interact.

  4. It ends up working out quite nicely algebraically.

Consequences

Now we have $V$ a real vector space with the extra structure of multiplication by $i$, denote $J: V \rightarrow V$, and with a metric $g$, satisfying $g(Jv,Jw) = g(v,w)$.

Claim: $g(v,Jv) = 0$.

Proof: $g(v,Jv) = g(Jv,J^2v) = g(Jv,-v) = g(-v,Jv) = -g(v,Jv)$, so it must be zero.

A symplectic form for free

Define a bilinear form $\omega: V \times V \rightarrow \mathbb{R}$ via $\omega(v,w) = g(v,Jw)$.

Then $\omega(v,v) = 0$, so $\omega$ is alternating. It's also non-degenerate, because $g$ is and $J$ is invertible. Therefore it's a symplectic form (an alternating non-degenerate bilinear form on $V$).

The Hermitian form

A Hermitian form is a different sort of object than $g$ or $\omega$: it's a real-bilinear form valued in $\mathbb{C}$, i.e. a bilinear map $V \times V \rightarrow \mathbb{C}$, which moreover is non-degenerate, satisfies $h(\lambda v,w) = \lambda h(v,w)$ and $h(v,\lambda w) = \overline{\lambda} h(v,w)$, and satisfies $h(v,w) = \overline{h(w,v)}$.

Let $h = g + i \omega$. Notice that we built this entirely out of $g$, with $h(v,w) = g(v,w) + i g(v,Jw)$. I claim this is a Hermitian metric.

Testing scalar multiplication in the first factor:

$h(av,w) = g(av,w) + i g(av,Jw) = a(g(v,w) + i g(v,Jw)) = a\, h(v,w)$.

And $h((bi)v,w) = g(bJv,w) + i g(bJv,Jw) = g(-bv,Jw) + i g(bv,w) = (bi)(g(v,w) + i g(v,w)) = (bi) h(v,w)$.

So $h(\lambda v,w) = \lambda h(v,w)$ for $\lambda \in \mathbb{C}$.

Testing scalar multiplication in the second factor:

$h(v,aw) = g(v,aw) + i g(v,aJw) = a(g(v,w) + i g(v,Jw)) = a\, h(v,w)$.

And $h(v,(bi)w) = g(v,bJw) + i g(v,bJ^2w) = b( g(v,Jw) - i g(v,w)) = (-bi)(g(v,w) + i g(v,w)) = (-bi) h(v,w)$.

So $h(v,\lambda w) = \overline{\lambda} h(v,w)$ for $\lambda \in \mathbb{C}$.

One also has:

$h(w,v) = g(w,v) + i \omega(w,v) = g(v,w) - i \omega(v,w) = \overline{h(v,w)}$.

Therefore:

$h$ is a Hermitian form. In fact, every Hermitian form arises in this way:

Going back

Starting with a Hermitian form $h$, let $g = \mathfrak{Re}(h)$ and $\omega = \mathfrak{Im}(h)$. Then you can check that $g$ is a real metric and $\omega$ is a symplectic form and equals $g(v,Jw)$.

In coordinates

Consider $\mathbb{C}^n$ with underlying real vector space $\mathbb{R}^{2n}$ with the standard real-valued metric on $\mathbb{R}^{2n}$. As noted above, multiplication by $i$ is an isometry for this metric.

Write $\mathbb{R}^{2n} = \mathbb{R}^n \oplus \mathbb{R}^n$, with multiplication by $i$ an isomorphism from the first factor to the second. Write $(v_j)_{j=1}^n$ for the standard basis for the first factor and $(Jv_j)_{j=1}^n$ for the corresponding standard basis for the second factor.

Then (using Einstein summation notation):

$h(a_jv_j + b_jJv_j, c_k v_k + d_k J v_k)$ is equal to

$g(a_jv_j + b_jJv_j, c_k v_k + d_k J v_k) + i g(a_jv_j + b_jJv_j, c_k Jv_k - d_k v_k)$

which equals

$a_j c_j + b_j d_j + i(- a_j d_j + b_j c_j)$

which equals

$(a_j + i b_j)(c_j - i d_j)$

This is the standard Hermitian form $\left<z,w\right> = z_j \overline{w}_j$

An alternate convention

An alternative convention is to define $\omega(v,w) = g(Jv,w)$. This leads to a variant in which $g+i\omega$ is conjugate-linear in the first factor and linear in the second. That is, the standard form you'd get on $\mathbb{C}^n$ would be $\left<z,w\right> = \overline{z}_j w_j$.


I make no claims that the above is "What is a $\ldots$ Hermitian metric?" quality.

Hermitian metrics are one place that typical undergraduate curricula gloss over. In my experience, one tends not to learn the above until studying complex manifolds or complex algebraic geometry.


Let us denote the canonical real inner product by $\langle v,w\rangle_{\mathbb{R}}$, where $v,w\in\mathbb{R}^n$.

If we want to define a complex inner product in $\mathbb{C}^n$, $\langle v,w\rangle_{\mathbb{C}}$, since $\mathbb{C}^n\supset\mathbb{R}^n$, we can think that this complex inner product is an extension of the real inner product. So if $v,w\in\mathbb{R}^n$ then $\langle v,w\rangle_{\mathbb{C}}=\langle v,w\rangle_{\mathbb{R}}$.

Now, we should also ask that $\langle v,v\rangle_{\mathbb{C}}\geq 0$, because we want to define the norm of $v$ as $\|v\|=\sqrt{\langle v,v\rangle_{\mathbb{C}}}$.

Thus, we want two properties for $\langle v,w\rangle_{\mathbb{C}}$:

  1. $\langle v,w\rangle_{\mathbb{C}}=\langle v,w\rangle_{\mathbb{R}}$, if $v,w\in\mathbb{R}^n$.
  2. $\langle v,v\rangle_{\mathbb{C}}\geq 0$, because we want to define $\|v\|$ such that $\|v\|^2=\langle v,v\rangle_{\mathbb{C}}$.

Now, let us consider $n=1$.

If $a,b\in\mathbb{R}^1$ then $\langle a,b\rangle_{\mathbb{C}}=\langle a,b\rangle_{\mathbb{R}}=ab$

If $a\in\mathbb{C}^1$ then $\langle a,a\rangle_{\mathbb{C}}$ must be the square of a norm. But $\mathbb{C}^1$ has a canonical norm, since we are trying to extend our definitions, let us use this canonical norm. So $\langle a,a\rangle_{\mathbb{C}}=|a|^2=a\overline{a}$.

Thus, we obtained these two properties:

  1. $\langle a,b\rangle_{\mathbb{C}}=ab$, if $a,b\in\mathbb{R}$.
  2. $\langle a,a\rangle_{\mathbb{C}}=a\overline{a}$.

So how should be defined $\langle a,b\rangle_{\mathbb{C}}$ for $a,b\in\mathbb{C}^1$? Ans: $\langle a,b\rangle_{\mathbb{C}}=a\overline{b}$.

By analogy in $\mathbb{C}^n$, the inner product shoul be $\langle(a_1,\ldots,a_n),(b_1,\ldots,b_n)\rangle=\sum_{i=1}^na_i\overline{b_i}$ (Here we could also say that if $(V,\langle\cdot,\cdot\rangle_1)$ and $(W,\langle\cdot,\cdot\rangle_2)$ are inner product spaces then $V\times W$ is a inner product space with this inner product $\langle(v_1,w_1),(v_2,w_2)\rangle=\langle v_1,v_2\rangle_1+\langle w_1,w_2\rangle_2$).