Why is it important that a basis be orthonormal?

Why is it important that a basis be orthonormal? The requirement for Orthonormal basis is so often repeated in linear algebra that it seems linear algebra depends on it as requirement. What is gained (or lost) if the basis are not orthonormal? Is it just for convention and convenience or is it just to restrict to the type of problems that assumption of orthonormality is beneficial?

When a basis is orthonormal, then a vector is merely the sum of its orthogonal projections onto the various members of the basis.

That is not true of bases in general (see my comment below for a simple counterexample).

Often orthogonality represents a form of independence (as in statistics, where it says that there is a lack of covariance, or linear algebra where direct sums of vector spaces have a canonical inner product for which the sum is orthogonal). Orthogonal basis then means the ability to decompose an effect into separate, independent, non-interacting parts that simply add up to form the whole effect. This kind of decomposition is hugely important in situations where it can be done.

The question is somewhat equivalent to asking "why is it important to use the standard basis in $\mathbb R^n$". The answer is that sometimes it is important and sometimes it is not, but clearly it is convenient to use the standard basis. So, what do you do if you have a vector space that does not have a standrad basis? or, more fundamentally, what is so standard about the standard basis?

Well, the standard basis is an orthonormal basis with respect to a very familiar inner product space. And any orthonormal basis has the same kind of nice properties as the standard basis has.

As with everything, the choice of the basis should be made with consideration to the problem one is trying to solve. In some cases, orthonormal bases will help. If there is no particular reason to prefer one base over another, then choosing an orthonormal one is likely to be easier to compute with.

Consider a vector: $$ v=ax_1+bx_2, $$ where $a, b \in \mathbb R$ and $x_1$, $x_2$ are basis vectors. Let's find the projection of $v$ onto $x_1$.

1.Non-orthogonal basis:
Let $x_1,x_2$ be non-orthogonal basis. $$ \langle v,x_1 \rangle =\langle ax_1,x_1\rangle +\langle bx_2,x_1\rangle \\ =a\langle x_1,x_1\rangle +b\langle x_2,x_1\rangle. $$
Since the basis is non-orthogonal $\langle x_2,x_1\rangle \ne 0$. So, to compute the projection of $v$ on $x_1$, we have to use two inner products $\langle x_1,x_1\rangle$ and $\langle x_2,x_1\rangle$.

2.Orthogonal basis:
Let $x_1,x_2$ be orthogonal basis. $$ \langle v,x_1 \rangle =\langle ax_1,x_1\rangle +\langle bx_2,x_1\rangle \\ =a\langle x_1,x_1\rangle +b\langle x_2,x_1\rangle \\ =a\langle x_1,x_1\rangle, $$ since $\langle x_2,x_1\rangle =0$ by the definition of orthogonality. So, to compute the projection of $v$ on $x_1$, we have to use only one inner product $\langle x_1,x_1\rangle$. Therefore, orthogonal basis allows to compute projections easier than non-orthogonal basis.

3.Orthonormal basis:
Let $x_1,x_2$ be orthonormal basis. $$ \langle v,x_1 \rangle =\langle ax_1,x_1\rangle +\langle bx_2,x_1\rangle\\ =a\langle x_1,x_1\rangle +b\langle x_2,x_1\rangle\\ =a\langle x_1,x_1\rangle \\ =a, $$ since $\langle x_1,x_2\rangle =0$ by the definition of orthogonality and $\langle x_1,x_1\rangle =1$ by the definition of normality. So we can compute the projection of $v$ on $x_1$ instantaneously without any inner product: the projections are just coefficients of the corresponding basis components. Since an orthonormal basis doesn't require any computation to find a projection, this is the best basis to use.

Calculating scalar product -- hence length and angle -- of vectors given with coordinates, is much simpler w.r.t. an orthonormal basis.

If $b_1,\ldots,b_n$ is any basis, and a scalar product $\langle-,-\rangle$ is given in the vector space (else it wouldn't even make sense to speak about 'orthogonality'), then for the general formula for scalar products with given coordinates, we have to form the so called Gram-matrix of the basis: $$\Gamma:=(\langle b_i,b_j\rangle)_{i,j}$$ And then $$\left\langle\sum_i\alpha_ib_i,\,\sum_i\beta_ib_i\right\rangle=\sum_{i,j}\alpha_i\,\langle b_i,b_j\rangle\,\beta_j={\bf\alpha}^T\Gamma{\bf\beta}\,,$$ where ${\bf\alpha}^T=(\alpha_1,\ldots,\alpha_n)$.

If $(b_i)$ is an orthonormal basis, then $\Gamma$ is the identity matrix.

Why would one carry $\Gamma$ in all formulas which uses coordinates and scalar products, if we can always construct (by Gram-Schmidt process) an orthonormal basis starting out from arbitrary basis?

A function is uniformly differentiable if its derivative is uniformly continuous?

Equivalent Definition of Measurable set

Proving a set of automorphisms of a group is a group under composition

Counting the numbers with certain sum of digits.

Abelian Covering Space of Surface of genus $g$

Connected Components are Closed

Finding the values of $n$ for which $\mathbb{F}_{5^{n}}$, the finite field with $5^{n}$ elements, contains a non-trivial $93$rd root of unity

Probability of n balls in n cells, one remaining empty

Stronger than Nesbitt's inequality $\frac{a}{\sqrt[4]{8(b^4+c^4)}}+\frac{b}{a+c}+\frac{c}{a+b}\geq\frac{3}{2}$

Four monitors || Windows 10: mouse pointers stick to horizontal edge

Cloned new HDD, but get "No bootable device found" when installed

How do I increase the max open files in macOS Big Sur?