Geometric intuition behind convergence of Fourier series
Solution 1:
I don't know about a geometric interpretation, but here is a brief sketch of a proof. First we need to be precise about what we mean by "convergence." In the naive sense, Fourier series don't always converge - that is, pointwise. (If you change the value of a function at a single point, the Fourier series remains unchanged.) The sense in which they do always converge is in the Hilbert space $L^2([0, 1])$, which has inner product defined by $\langle f, g \rangle = \int_0^1 \overline{g(x)} f(x) dx$ inducing a norm, which induces a metric. In $L^2([0, 1])$ let $X$ be the subspace spanned by the functions $e^{2\pi i nx}, n \in \mathbb{Z}$. It is fairly straightforward to verify that the functions $e^{2\pi i nx}$ are orthogonal and have norm $1$; generally I think about this in a representation-theoretic way, as a special case of the orthogonality relations for characters.
Then the statement that Fourier series converge is equivalent to the statement that $X$ is dense in $L^2([0, 1])$. Why? Given a sequence in $X$ converging to an element of $L^2([0, 1])$ we can compute the Fourier coefficients, which depend continuously on the sequence and hence which converge to a limit. That these coefficients actually represent the element of $L^2([0, 1])$ is a standard Hilbert space argument and you should take a course in functional analysis if you want to learn this kind of stuff thoroughly.
Now, something else you need to know about $L^2([0, 1])$ is that the subspace $Y$ consisting of all step functions is dense in it. (If you have trouble believing this, first convince yourself that $Y$ is dense in the continuous functions on $[0, 1]$ and then believe me that the continuous functions are dense in $L^2([0, 1])$. In fact, $L^2([0, 1])$ can be defined as the completion of $C([0, 1])$ with respect to the $L^2$ norm.) So to show that $X$ is dense, it suffices to show that the closure of $X$ contains $Y$. In fact, it suffices to show that $X$ has as a limit point a step function with a single bump, say
$$a(x) = \begin{cases} 0 \text{ if } 0 \le x \le \frac{1}{3}, \frac{2}{3} \le x \le 1 \\ 1 \text{ otherwise} \end{cases}$$
and to take linear combinations, translations, and dilations of this. In other words, it suffices to prove convergence for square waves. But one can do the computations directly here. There is a standard picture to stare at, and of course if you have ever actually heard a square wave you should believe that audio engineers, at least, are perfectly capable of approximating square waves by sines and cosines.
Solution 2:
Since your question was about the geometry behind convergence, I'll chime in with a very geometric way to think about these concepts. However, as Qiaochu Yuan mentions, in order to do so, we must first nail down in what sense we mean convergence. I'll discuss the "big three" types of convergence: pointwise, uniform, and mean-square (also called $L^2$) convergence.
Let's begin with defining a notion of $error$ between $f(x)$ and the $N$th partial sum of its Fourier series, denoted by $F_N(x)$, on $-\ell<x<\ell$. Define the (absolute) pointwise error, $p(x)$, by $$p(x)=|f(x)-F_N(x)|, \quad -\ell<x<\ell.$$ The geometry of the situation belies its name: $p(x)$ represents the point-by-point difference (or error) between $f(x)$ and $F_N(x)$.
We can then define the following three types of convergence based on the behavior of $p(x)$ as $N\to\infty$.
- $F_N(x)$ converges pointwise to $f(x)$ on $-\ell<x<\ell$ if $$p_N(x)\to 0 \text{ as } N\to\infty \text{ for each fixed }x\in(-\ell,\ell).$$
- $F_N(x)$ converges uniformly to $f(x)$ on $-\ell<x<\ell$ if $$\sup_{-\ell<x<\ell}p_N(x)\to 0 \text{ as } N\to\infty.$$
- $F_N(x)$ converges in the mean-square or $L^2$ sense to $f(x)$ on $-\ell<x<\ell$ if $$\int_{-\ell}^\ell p_N^2(x)\,dx\to 0 \text{ as } N\to\infty.$$
Think of each of these in terms of what is happening with the pointwise error as $N\to \infty$. The first says that at a fixed $x$, the difference between $f(x)$ and $F_N(x)$ is going to zero. This may happen for some $x$ in the interval and fail for others. On the other hand, uniform convergence says that the supremum of all pointwise errors tends to zero. Finally, the mean-square error says that the area under $p^2(x)$ must tend to zero as $N\to\infty$.
The first is a very local way to measure error (at a point), whereas the second two are global ways to measure the error (across the entire interval).
We can formulate this in terms of norms by setting $$\|f-F_N\|_\infty:=\sup_{-\ell<x<\ell}|f(x)-F_N(x)|$$ Then, $F_N(x)\to f(x)$ uniformly on $-\ell<x<\ell$ provided $\|f-F_N\|_\infty\to 0$ as $N\to\infty$. (This is why we call it the uniform norm!)
On the other hand, if we set $$\|f-F_N\|_{L^2}:=\sqrt{\int_{-\ell}^\ell |f(x)-F_N(x)|^2\,dx},$$ then $F_N(x)\to f(x)$ in the $L^2$ sense on $-\ell<x<\ell$ provided $\|f-F_N\|_{L^2}\to 0$ as $N\to\infty$. (This is called the $L^2$ norm on $-\ell<x<\ell$.)
To illustrate this geometrically, here's $f(x)=x^2$ (black) and its Fourier sine series $F_N(x)$ (blue) on $0<x<1$ for $N=5,\dots,50$ and the corresponding pointwise error (red). We can see this series converges pointwise but not uniformly on $0<x<1$. You can also get an idea of the $L^2$ convergence by envisioning the area under the square of the red curve and seeing it tend to zero also. I was going to post that picture as well, put the shaded area is so thin it is difficult to see.
These illustrations are of course not a proof of the convergences, but simply a way to interpret them geometrically.
For the sake of completeness, here's an example which does converge uniformly: the same function and interval as above, but $F_N(x)$ is the Fourier cosine series.
Hope that helps.
Solution 3:
You can write the partial sum $S_n(x)$ as an integral $${1\over 2\pi}\int_{-\pi}^\pi D_n(t) f(x-t)dt,$$ where the weight function or "kernel" $D_n(t)$ can be easily computed and graphed once and for all. One obtains $$D_n(t)={\sin((n+1/2)t)\over \sin(t/2)}.$$ So $S_n(x)$ is an "average" of $f$-values from the neighborhood of $x$. The essential point is that $D_n(t)$ is heavily concentrated around $t=0$ and oscillates quickly far away from $0$.