What does it mean to work without a basis?

When reading proofs or definitions on Wikipedia, I'm accustomed to seeing both a basis-dependent discussion and basis-free discussion. Take, for example, this page on the tensor product, which has a "free vector space" discussion alongside a matrix representation discussion.

While I understand not all vector spaces have a canonical basis, it's not formally clear to me what it means for a proof or definition to be basis-dependent vs. basis-free, and why it's better to be basis-free in the first place.

1) If I'm writing a proof or defining an object, what rules or criteria must I follow to be properly basis free? And, once I know what a basis-dependent proof or definition looks like, what is the strategy from generalizing it to a basis-independent proof or definition?

2) If all vector spaces can be represented by a (not necessarily canonical) basis, can't we always represent operators and members of that that space with matrices and linearly independent sums? My larger question, is then, if we're really making no assumptions when write down matrices or element-wise operations, why is it bad or ungentlemanly to choose a basis without loss of generality?


Apart from, arguably, being more elegant, coordinate-free (i.e., free from a choice of basis) methods are necessary when dealing in situations where bases are useless. Bases become useless for two reasons; there may not be a basis at all (i.e., in infinite dimensional vectors space, commonly used in quantum mechanics for instance, without assuming the Axiom of Choice) or bases are really computationally cumbersome (i.e., in infinite dimensional vector space with the Axiom of Choice, bases exists but cannot be computed explicitly, or in finite dimensional vector spaces of large dimension, bases exist explicitly but working with coordinates will require a computer to do everything for you while things can perfectly be done by hand).

Part of this discussion is a general phenomenon in mathematics (and life itself, possibly): making arbitrary choices often leads to trouble. That isn't a mathematical statement though, it's more of a rule of thumb. Examples of the kind of trouble you run into are visible early on in, say, linear algebra. When defining a property of a linear transformation between finite dimensional spaces by saying "take a representing matrix, and do this and that to the matrix", then in order to truly be able to say that one defined a property of the linear transformation one must now check independence from the choice of representing matrix. If, instead, one is able to define the concept directly on the linear transformation, then that's a more direct definition that does not require that extra bit of clarification. Such definitions are sometimes a bit more advanced, but tend to be much more elegant and with a much greater explanatory power.

Since in linear algebra what we really care about are linear transformations, and the matrices are merely there to represent them for us, the matrices are a tool. A computational aid that depends on coordinates. No doubt, they are very helpful in computations and they often allow to very quickly, but uninformatively, define concepts and even prove facts. But the arbitrary choice made to adds a layer of obstruction to understanding what is going one. It is often best, even if requiring slightly more effort, to remove the obstruction and marvel at the abstraction. For instance, the concept of similarity of matrices can be defined in two ways. Two square matrices $A,B$ of the same order are similar if $A=P^{-1}BP$ for some invertible matrix $P$. This is a very short a easy to memorise definition and it can be used to prove lots of things, but its purpose is obscure, hidden behind an arbitrary choice. Now, an equivalent definition would be that $A,B$ are similar if there exists a single linear transformation $T$ such that $A$ is a representation of $T$ in some base and $B$ is a representation of the same $T$ is some (other) base. So, matrices are similar precisely when they represent the same linear transformation, perhaps in using different bases. Think now of results that one is often asked to prove using the first definition. For instance, that if $A$ and $B$ are similar, then they have the same eigenvalues. The proof is easy, but it seems a bit like magic. One cannot see the reason why it's true, or how one would come to make the guess that such a result might be true. With the second definition though it is obvious: since $A,B$ represent the same linear transformations, it highly plausible that the eigenvalues of $A,B$ are the eigenvalues of $T$, so in particular they must agree.


The first thing you need to understand is that "basis-free" and "basis-dependent" are not crisp technical terms; they do not have formal definitions. Things that are true are true no matter whether how much or little their definitions and proofs make use of bases; what we have here is a fuzzy concept that we use for talking of definitions/proofs as acts of communication between human beings, beyond their formal content.

A basis-free definition is one where it is immediately obvious that what it defines does not depend on a choice of basis -- usually because it does not mention a basis or coordinates at all. At the other end of the spectrum there's definitions that directly mention bases; before we can agree that they define a property in abstract vector spaces we need to actually prove that if we apply the definition with two different bases, the results will agree.

Between these two extremes there is a sparsely populated grey area where it will be obvious to some but not all readers that the definition defines a concept for abstract vector spaces. For example, in tensor calculus, definitions that use the Einstein summation convention look extremely basis-dependent at first glance, if the convention has first been defined in terms of invisible $\sum$ signs over concrete indices. However, if we write down exactly the same symbols (and make sure to follow some sanity rules that are not terribly important in the invisible-$\sum$ picture), we can call it "abstract index notation" instead, and someone familiar with that can immediately see how the expression encodes a particular combination of operations he has already convinced himself are independent of basis, and the whole thing is therefore basis free.

And the point here is that it is okay that there is a grey area, because the distinction between basis-free and basis-dependent is not a technical one.

For proofs the situation is slightly different. Once we prove something it is proven, and it doesn't matter whether for the technical soundness of the proof that we chose a basis somewhere in it. (In other words, in contrast to definitions, using a basis in a proof does not hit you with additional proof obligations).

The reason why one might want to care is again related to communications. A proof really has (at least) two purposes. The first one is to convince you that what the theorem states is actually true. This is where formal correctness comes in, and where it doesn't matter whether you use a basis or not.

But the second purpose of a proof is to convey some intuition about why the thing is claimed is necessarily true. A proof where the QED comes out of an impenetrable flurry of coordinate algebra is still a valid proof: it does establish that the goal is true. But it is often not terribly useful for answering "how can I think about this theorem such that it is intuitively clear to me that it has to be true". Proofs that don't speak about coordinates -- when they are available! -- generally tend to be better at that.

The real goal here is that proofs ought to be clear and convey useful intuition. Being coordinate-free is not a goal in itself, but is merely a rule of thumb for how to reach that goal. Sometimes, perhaps, coordinate manipulation is exactly the way to produce the clearest proof. (This can be the case, for example, when we can choose particularly nice basis, such as one that diagonalizes some operator we're talking about).

Also, of course, sometimes the best we can do is to produce a proof that happens to be a maze of complicated algebra. Then that's still a valid proof, though we might have hoped for something better (and may keep searching for something better, on the back burner).