Why do non-orthogonal basis functions encode 'redundant' information in transforms?

I'm currently learning about wavelets and keep running into this idea of redundancy involved with the continuous wavelet transform. What I have gathered so far is that there is some 'commonality' between the wavelets in the continuous wavelet transform. Thus, by doing the full transform, you actually collect redundant information.

However, I'm not sure if I fully understand this. Why is it that there was never this topic of redundancy when doing the fourier transform? Well I will guess that the answer to that is due to the fact that all sinusoidal functions of different frequencies are orthogonal to one another. Thus when you do the fourier transform, you don't encode redundant information.

So I've concluded that by having orthogonal functions, there becomes no redundancy of encoding information regarding the signal. However, I'm looking for an 'Explain Like I'm Five' type explanation of why orthogonal basis functions remove redundancy.

For some reason I have a feeling that the entire concept could be explained in 2D space using the concept of projections.. but i'll wait to see.


Listen to your gut.

Let’s look at a pair of linearly independent unit vectors $\mathbf u$ and $\mathbf v$ in $\mathbb R^2$. (They don’t really have to be unit vectors, but omitting all of the normalization factors that would otherwise be necessary reduces clutter.)

enter image description here

If $\mathbf v$ is not orthogonal to $\mathbf u$, then they overlap: there’s a component of $\mathbf v$ that’s parallel to $\mathbf u$, i.e., $\mathbf v$ contains a redundant non-zero scalar multiple of $\mathbf u$. Similarly, $\mathbf u$ has a redundant $\mathbf v$-component.

If we have an orthonormal basis $(\mathbf u,\mathbf v)$ of $\mathbb R^2$, we can express a vector $\mathbf w$ as a linear combination of the basis vectors via orthogonal projection: $$\mathbf w=\pi_{\mathbf u}\mathbf w+\pi_{\mathbf v}\mathbf w=(\mathbf u\cdot\mathbf w)\mathbf u+(\mathbf v\cdot\mathbf w)\mathbf v.$$ If we try to do this with non-orthogonal basis vectors, however, it doesn’t work.

enter image description here

The problem is that those overlaps between $\mathbf u$ and $\mathbf v$ are overcounted when we add up the individual projections. The red vector in the above diagram is the redundant contribution of the orthogonal projection onto $\mathbf v$, and the blue is a redundant contribution from projection onto $\mathbf u$.

The same thing occurs when the vectors are functions instead of elements of $\mathbb R^2$. If the basis vectors aren’t orthogonal, then they overlap to some degree and individual orthogonal projections onto them contribute redundant elements to the sum.

The Gram-Schmidt process finds and eliminates such redundancies among a set of vectors. Another way to eliminate them is to change the direction of projection so that it’s parallel to the other basis vector. In higher-dimensional spaces, that can be generalized to projecting parallel to the given basis vector onto its complement and then subtracting that projection from the original vector, but that’s pretty much what you do when applying the Gram-Schmidt process.


To give a more functional analytic answer to compliment @amd's (excellent) answer, the reason that orthogonal decompositions do not encode redundant information is because of Parseval's identity which is given by $$ \lvert\lvert f \rvert\rvert^{2}= \sum^{\infty}_{i=1}\lvert \langle f,g_{n} \rangle \rvert^{2}$$ and basically means that taking an inner product of a function with an orthogonal basis preserves the energy of a function (i.e. $L^{2}$-norm). It is easy to conclude that orthgonality is always better but this is not always true as sometimes, we can achieve a generalization of Parseval's identity via something called a "frame". We say that a sequence of functions $g_{n}(x)$ satisfy the frame condition if $$ A\lvert\lvert f \rvert\rvert^{2} \leq \sum^{\infty}_{i=1}\lvert \langle f,g_{n} \rangle \rvert^{2} \leq B\lvert\lvert f \rvert\rvert^{2} $$ and this allows us to relax the orthogonality requirement for $g_{n}$. These types of decompositions are particularly important in wavelet analysis, where frames are often much easier to work with than orthonormal bases, especially in industry settings.


A third aspect (which often is just as important in practice) is that the space of functions you analyze (for a particular application) is likely to be a rather small subset of the set of all possible functions which is what a basis (or frame) ensures us to be able to represent. And the functions you won't encounter you will not need designing your wavelets for.