why scalar projection does not yield coordinates?
Suppose we have an ordered basis $\{v_1,\dots,v_n\}$ in some inner product space. Let us project a vector $v$ on each $v_i$ by multiplying $v_i$ by the "scalar projection" $(v,v_i)/\|v_i\|$. Intuitively, it seems that each scalar projection $(v,v_i)/\|v_i\|$ indicates the amount of $v$ that goes in $v_i$ and therefore the $i^{th}$ coordinate of $v$ should be $(v,v_i)/\|v_i\|$. But that does not happen unless the basis is orthogonal.
Mathematically I can justify this but can someone give an intuitive reason as to what goes wrong. For example with $B=\{(1,0),(1,1)\}$ in the Euclidean space $\mathbb R^2$ where $v=(0,1)$?
In a nutshell, the problem is that you end up overcounting components of the vector in “overlapping” directions. You’re trying to decompose the vector $w$ into $\sum\mathbf\pi_kw$, where $\mathbf\pi_k$ is orthogonal projection onto $v_k$. If $v_i$ and $v_j$ are not orthogonal, then $\mathbf\pi_jv_i\ne0$, so if $\mathbf\pi_iw\ne0$, it will make an excess contribution to the projection onto $v_j$. To put it another way, if $v_i$ and $v_j$ are not orthogonal, this introduces an undesirable dependency between the $i$th and $j$th coordinates of $w$: if we change the $i$th coordinate, this excess contribution of $v_i$ in the $v_j$ direction will also change the $j$th coordinate.
In order to produce coordinates with these individual projections you have to eliminate this “overlap” among the non-orthogonal basis vectors, which is precisely what the Gram-Schmidt process does.
Addition: The phenomenon can be illustrated in $\mathbb R^2$. Let $v_1=(1,0)^T$ and $v_2=(1,1)^T$. We can see in the diagram below that if we add up the orthogonal projections of $w$ onto these two vectors, we don’t end up with $w$.
This diagram also suggests a way to salvage our decomposition via projections. Instead of projecting orthogonally, project parallel to the other basis vector (indicated by the black dotted lines). We then get the familiar parallelogram addition diagram that we all know and love. This provides another intuition into what’s going on: in a sense, orthogonal projection goes in the wrong direction.
We can also see from this diagram exactly what the excess contributions are. If $\mathbf\pi_1'$ and $\mathbf\pi_2'$ are the two parallel projections, then $\mathbf\pi_2w$ is too long by exactly $\mathbf\pi_2\mathbf\pi_1'w$, i.e., the orthogonal projection of the actual component of $w$ in the $v_1$ direction onto $v_2$, and similarly for $\mathbf\pi_1w$.
The salient feature of these parallel projections is that $\mathbf\pi_2'v_1=\mathbf\pi_1'v_2=0$. This property can be extended to higher-dimensional spaces. Instead of using orthogonal projection, we want projections such that $\mathbf\pi_i'v_j=0$ when $i\ne j$ or, equivalently, $\ker{\mathbf\pi_i'}=\operatorname{span}(B\setminus\{v_i\})$. Constructing such projections is relatively straightforward.
I think your intuition should be that there is a huge difference between "the portion of $v$ that points in the direction of $v_1$" and "the portion of $v_1$ that is necessary to compose $v$". You may be carrying the intuition that they are the same from the orthogonal case, but you really shouldn't be. Here are a few reasons why:
- $v_2$ may also point significantly in the direction of $v_1$, but it makes no sense to include this in the coefficient of $v_1$. E.g. consider $v = 1v_1 + 1v_2$ and dot both sides with $v_1$.
- The portion of $v$ pointing towards $v_1$ is a quantity that depends purely on $v$ and $v_1$, but the basis representation of $v$ depends critically on all the basis vectors. It's unreasonable to expect the former to be able to tell you the latter, even approximately.
- Fundamentally, representing a vector in terms of a basis is an inversion problem (it involves solving a matrix equation). The process you are proposing is fundamentally a multiplication process (let's simplify to the case of a basis of unit vectors, and it literally is multiplication). So in a very general, abstract sense you are doing it backwards. But in the special case of orthogonal basis $B$ we have $B^{-1} = B^T$ so we can indeed take inverses by multiplying. The fallacy here is believing that one can, in the generic case compute $B^{-1}$ just by rescaling $B$ by some $\|B\|^2$: that's not how matrix inverses work.
If we take a base $\{e_i\}_{i=1}^n$ (not orthonormal in general) in a vector space, any vector $X$ can be expressed by a linear combination of the base vectors:
$$X=\sum_iX^ie_i$$
The coefficients of this linear combination are called contravariant components of $X$ respect to the base, i.d. the components that, in a Euclidean space, from the vector, if we add up them with the "parallelogram law".
On the other hand, there exist other components, called covariant components, that are the projections of the vector $X$ along the directions of the base vectors and you get them by scalar product:
$$X_i=X\cdot e_i$$
As you can see I used $X^i$ for the contravariant components and $X_i$ for the covariant ones. This difference is relevant in a non-orthogonal base system, but it becomes irrelevant in an orthogonal system where the two components coincide.
Take a look