How did mathematicians decide on the axioms of linear algebra

So we vector spaces, linear transformations, inner products etc all have their own axioms that they have to satisfy in order to be considered to be what they are.

But how did we come to decide to include those axioms and not include others?

For example, why does this rule hold in inner product spaces $c\langle u,v\rangle=\langle cu,v\rangle$, when my intuition says that it should be $\langle cu,cv\rangle$?

And how did we decide that it was scalar multiplication and additive were sufficient criteria for something to be a linear map?


Linear algebra is one of the first "abstractions" that you encounter in mathematics that is not very well motivated by experience. (Well... there are numerals, which are a pretty tricky abstraction as well, but most of us don't recall learning those.)

It helps to have the backstory.

Mathematicians studied geometry and simple transformations of the plane like "rotation" and "translation" (moving everything to the right by 3 inches, or up by 7 inches, or northeast by 3.2 inches, etc.) as far back as Euclid, and at some point, they noticed that you could do things like "do one translation after another", and the result was the same as if you'd done some different translation. And even if you did the first two translations in a different order, the resulting translation was still the same. So pretty soon they said "Hey, these translations are behaving a little like numbers do when we add them together: the order we add them in doesn't matter, and there's even something that behaves the way zero does: the "don't move at all" transformation, when composed with any other translation, gives that other translation."

So you have two different sets of things: ordinary numbers, and "translations of the plane", and for both, there's a way of combining ("+" for numbers, "composition of transformations" for translations), and each of these combining rules has an identity element ("0" for addition, "don't move at all" for translation), and for both operations ("+" and "compose"), the order of operations doesn't matter, and you start to realize something: if I proved something about numbers using only the notion of addition, and the fact that there's an identity, and that addition is commutative, I could just replace a bunch of words and I'd have a proof about the set of all translations of the plane!

And the next thing you know, you're starting to realize that other things have these kinds of shared properties as well, so you say "I'm going to give a name to sets of things like that: I'll call them 'groups'." (Later, you realize that the commutativity of addition is kind of special, and you really want to talk about other operations as well, so you enlarge your notion of "group" and instead call these things "Abelian groups," after Abel, the guy who did a lot of the early work on them.)

The same thing happened with linear algebra. There are some sets of things that have certain properties, and someone noticed that they ALL had the same properties, and said "let's name that kind of collection". It wasn't a pretty development -- the early history of vectors was complicated by people wanting to have a way to multiply vectors in analogy with multiplying real numbers or complex numbers, and it took a long time for folks to realize that having a "multiplication" was nice, but not essential, and that even for collections that didn't have multiplication, there were still a ton of important results.

In a way, though, the most interesting thing was not the sets themselves -- the "vector spaces", but rather, the class of transformations that preserve the properties of a vector space. These are called "linear transformations", and they are a generalization of the transformations you learn about in Euclid.

Why are these so important? One reason for their historical importance is that for a function from $n$-space to $k$-space, the derivative, evaluated at some point of $n$-space, is a linear transformation. In short: something we cared a lot about -- derivatives -- turns out to be very closely tied to linear transformations.

For a function $f: R \to R$, the derivative $f'(a)$ is usually regarded as "just a number". But consider for a moment $$ f(x) = \sqrt{x}\\ f'(x) = \frac{1}{2 \sqrt{x}}\\ f(100) = 10 \\ f'(100) = \frac{1}{20} $$ Suppose you wanted to compute the square root of a number that's a little way from 100, say, 102. We could say "we moved 2 units in the domain; how far do we have to move away from 10 (i.e., in the codomain)?" The answer is that the square root of $102$ is (very close to) the square root of 100, displaced by $2 \cdot \frac{1}{20}$, i.e., to $10.1$. (In fact, $10.1^2 = 102.01$, which is pretty accurate!)

So we can regard "multiplication by $1/20$" as the derivative of square-root at $a = 100$, and this gives a linear transformation from "displacements near 100, in the domain" to "displacements near 10, in the codomain."

The importance of derivatives made it really worthwhile to understand the properties of such transformations, and therefore to also understand their domains...and pretty soon, other situations that arose in other parts of math turned out to "look like those". For instance, the set of all polynomials of degree no more than $n$ turns out to be a vector space: you can add polynomials, you can multiply them by a constant, etc. And the space of all convergent sequences of real numbers turns out to be a vector space. And the set of all periodic functions of period 1 turns out to be a vector space. And pretty soon, so many things seemed to be sharing the same properties that someone gave 'sets that had those particular properties" a name: vector spaces.

Nowadays, seeing each new thing that's introduced through the lens of linear algebra can be a great aid...so we introduce the general notion first, and many students are baffled. My own preference, in teaching linear algebra, is to look at three or four examples, like "period-1 periodic functions" and "convergent sequences" and "polynomials of degree no more than $n$", and have the students notice that there are some similarities, and only then define "vector space". But that's a matter of taste.


For the particular case of inner products being bilinear:

Inner products are intended to generalize the usual dot product from plane and space vector algebra. Therefore it would not make sense to require it to satisfy a property that the dot product doesn't have.

For example, in the plane we have $2\bigl[ (1,1)\cdot (1,2) \bigr] = 6$ whereas $$[2(1,1)]\cdot[2(1,2)]=(2,2)\cdot(2,4) = 12$$ so your proposed rule doesn't hold for the dot product.