What are all the generalizations needed to pass from finite dimensional linear algebra with matrices to fourier series and pdes?
I've studied Linear Algebra on finite dimensions and now I'm studying fourier series, sturm-liouville problems, pdes etc. However none of our lecturers made any connection between linear algebra an this. I think this is a very big mistake because I see many questions here where everyone just talks about these topics as generalizations of simple linear algebra.
Hence what are the generalizations? For example I think a matrix becomes a function in infinite dimension, but that's about it.
Solution 1:
The generalization you're looking for is called functional analysis. Just as you might suspect, vectors turn in to functions, and matrices turn in to linear operators that act on functions. Basis expansions turn in to Fourier-type series. Eigenvalues and eigenvectors generalize to an area called spectral theory. Dot products generalize to inner products...and so on. The best introductory book I know of is "Introductory Functional Analysis" by Kreyszig. It's perfect for someone who has taken linear algebra and perhaps a first course in analysis (at the level of "Baby Rudin").
Here are a few more details:
Basis Expansions
- In linear algebra, you think about expanding a vector $x$ into a basis. What does this mean? Well, if $x\in V$ and $V$ is a finite dimensional vector space (say $\text{dim}(V) = n$) we can choose a special set of vectors $e_1,\ldots,e_n$ such that for any $x\in V$, there are coefficients $\alpha_1,\ldots,\alpha_n$ such that
$$ x = \sum_{j=1}^n\alpha_j e_j $$ In functional analysis, you can usually do the same thing. Given a function $f\in X$, where $X$ is now a vector space of functions, one can usually find a special infinite sequence $e_1(x),e_2(x),\ldots$ such that for any $f\in X$, there are coefficients $\alpha_1,\alpha_2,\ldots$ such that
$$ f(x) \stackrel{X}{=} \sum_{j=1}^\infty\alpha_j e_j(x) $$ Since this is an infinite sum, you have to be much more careful about what you mean by $=$, hence putting an $X$ on top to indicate that this is `equality in the sense of the vector space $X$'. This becomes more precise when you talk about norms, inner products, and metrics. An example would be the so-called Hilbert space $L^2([0,1])$, where equality means
$$ \lim_{n\rightarrow\infty} \int_0^1\left\vert f(x) - \sum_{j=1}^n \alpha_j e_j(x)\right\vert ^2 dx = 0 $$ Other examples obviously exist as well, but I won't go into it. Since it sounds like you're interested in Fourier series, you'll be happy to know that this is exactly what I'm talking about: if you take
$$ e_k(x) = \exp(2\pi i kx) $$, then any $f\in L^2([0,1])$ will have coefficients $\alpha_{\pm 1},\alpha_{\pm 2},\ldots$ such that $ f= \sum_{k=-\infty}^\infty \alpha_k\exp(2\pi ikx)$, where again $=$ means "in the $L^2$ sense".
Just like in finite dimensions, a special role is played by orthonormal bases in infinite dimensions. In finite dimensions, if you have a basis $e_1,\ldots,e_n$ that is orthonormal, then you can always write down the expansion coefficients as dot products:
$$ x = \sum_{j=1}^n (x\cdot e_j) e_j $$ In some function spaces, you have a generalization of the dot product called an inner product, written usually as $\langle f,g\rangle$. While there are many inner products, the most popular ones tend to be the $L^2$ inner products:
$$ \langle f,g\rangle_{L^2([a,b])} = \int_a^b f(x)\overline{g(x)} dx $$ Then, if $e_1,\ldots$ is an orthonormal basis for $L^2([a,b])$, you can write
$$ f(x) \stackrel{L^2}{=} \sum_{j=1}^\infty \langle f,e_j\rangle e_j(x) $$ Some times you don't have an inner product, so "orthonormal basis" does make sense (I have no way to measure angles). But, you can still do basis expansions, it just might be harder to compute the expansion coefficients. There's whole books on this subject - see e.g. A Primer on Basis Theory by Heil or Frames and Bases by Christensen.
Linear Operators
- In linear algebra, you work a lot with matrices, which are representations of linear operators $L:V\rightarrow W$ where $V$ and $W$ are two vector spaces with dimension $n$ and $m$. The matrix is a representation because we must choose bases for $V$ and $W$ in order to define it! Assuming you've selected bases, you can then write down a matrix $A$ such that
$$ (A\alpha)_i = \sum_{j=1}^n A_{ij}\alpha_j,\quad 1\leq i\leq m $$ where $\alpha$ is the vector of coefficients representing $x\in V$. This generalizes to functions in the same way, except you will need an infinite matrix:
$$ (A\alpha)_i = \sum_{j=1}^\infty A_{ij} \alpha_j,\quad 1\leq i\leq m $$ Here $m$ can either be finite or infinite, depending on the dimension of $W$.
Another perspective on linear operators comes from thinking about generalizing matrix-vector products from sums to integrals. Loosely speaking, you can imagine that as $n\rightarrow \infty$, you might have
$$ \sum_{j=1}^n A_{ij}\alpha_j \longrightarrow \int_{0}^1 A(x,y) f(y) dy $$ in the appropriate sense. Here $A(x,y)$ is now a function, so like you suspect, matrices turn in to functions. This perspective is extremely useful, as it comes up in the theory of Green's functions, finite elements, and integral equations.
One thing I will mention is that issues of domain and range become much more subtle. Whereas in finite dimensions it is fairly simple to talk about the domain and range of a linear operator, you can have issues with infinite dimensional operators having to do with boudnedness. For example, the derivative operator $L = \frac{d}{dx}$ is a very important linear operator to study. However it is "unbounded" on most "standard" spaces of functions. This is essentially because we can have "small" functions that get extremely "big" after differentiation - take $f(x) = \epsilon \sin(x/\epsilon^2)$, where $\epsilon$ is a very small number for instance. $f(x)$ is very "small" because it has a very small amplitude, but $f^\prime(x)=\frac{1}{\epsilon}\cos(x/\epsilon^2)$ is very "large".
Eigenvalues and Eigenvectors
- In linear algebra, you ask the question as to whether for a linear operator $L$ there exist special $v_j\in V$ and $\lambda_j\in \Bbb{C}$ such that
$$ Lv_j = \lambda_jv_j $$ You can do the same in functional analysis, with a few extra technical details (for instance, instead of finitely many eigenvalues or even countably infinitely many, you can have an entire interval of eigenvalues). Eigenvalues are still called eigenvalues, but eigenvectors are usually called eigenfunctions. A great example is the differentiation operator: if $L = \frac{d}{dx}$, without getting too technical, you can think about any exponential function $v(x) = \exp(\lambda x)$ as being an eigenfunction of $L$ with eigenvector $\lambda$, since
$$ (Lv)(x) = \lambda v(x) $$ Furthermore, the spectral decomposition of a Hermitian matrix (Hermitian means A = A^\dagger), usually written $A = Q\Lambda Q^\dagger$, where $\Lambda$ is a diagonal matrix and $Q$ is a unitary matrix, turns into something called the spectral theorem of self-adjoint operators, which states that any bounded "self-adjoint" (also a symmetry condition) operator $L$ will have a decomposition of the form $L = U T_\lambda U^\dagger$ where $U$ is a unitary operator and $T_\lambda$ is a multiplication operator (the generalization of a diagonal matrix). An example of this is the relationship between convolution and the Fourier transform. Say we define a linear operator $L$ as follows:
$$ (L_kf)(x) = \int_{-\infty}^\infty k(x-y)f(y)dy $$ Then, with some conditions on $k$, one can prove that this operator is bounded and self-adjoint, and furthermore, we have a "spectral decomposition" of $L$ of the form
$$ L = \mathcal{F}^*T_\lambda \mathcal{F} $$ where $\mathcal{F}$ is the (unitary) Fourier transform:
$$ (\mathcal{F}f)(\xi) = \int_{-\infty}^\infty f(x)\exp(-2\pi i \xi x) dx $$ This is sometimes written as
$$ (Lf)(x) = \mathcal{F}^{-1} [\hat{k}(\xi)\hat{f}(\xi)] $$ In other words, the eigenvalues of $L_k$ are given by $\hat{k}(\xi)$ (a continuum of them!) and the eigenfunctions of $L_k$ are the complex exponentials.
Solution 2:
Actually, separation of variables and orthogonal function expansions in the eigenfunctions of a Sturm-Liouville problem came before finite-dimensional linear algebra and the study of selfadjoint matrices. So, these topics are not generalizations of the linear algebra topics--they came first, which is one of the quirks of the subject. Again, infinite-dimensional orthogonal function expansions predated finite-dimensional linear algebra and the study of symmetric and Hermitian matrices on finite-dimensional Euclidean space. The infinite-dimensional studies drove the finite-dimensional.
If you have a selfadjoint square matrix $A$ on an $N$ dimensional real or complex space, then there is a finite number of eigenvalues $\lambda$ for which $A-\lambda I$ has a non-trivial null space. These eigenvalues $\{ \lambda_1,\lambda_2,\cdots,\lambda_n\}$ are the roots of the characteristic polynomial $p(\lambda)=\mbox{det}(A-\lambda I)$. For any such $\lambda_k$, a non-trivial solutions of $(A-\lambda_k I)x=0$ is an eigenvector with eigenvalue $\lambda_k$. Automatically (and this came from Sturm-Liouville theory,) the eigenvalues are real and the eigenvectors associated with different eigenvalues are automatically orthogonal to each other. For example, using the complex inner product, if $Ax=\lambda x$ for some $\lambda\in\mathbb{C}$ and $x \ne 0$, then $\lambda$ is real because the symmetry of $A$ gives $$ (\lambda-\overline{\lambda})(x,x)=(Ax,x)-(x,Ax) = 0 \implies \lambda=\overline{\lambda}. $$ And, if $Ax_1 = \lambda_1 x_1$, $Ax_2=\lambda_2 x_2$ with $\lambda_1 \ne \lambda_2$, then $(x_1,x_2)=0$ because the symmetry of $A$ gives $$ (\lambda_1-\lambda_2)(x_1,x_2)=(Ax_1,x_2)-(x_1,Ax_2) = 0. $$ Now, if you perform Gram-Schmidt on the different eigenspaces $\mbox{ker}(A-\lambda_k I)$, then you obtain an orthonormal basis of eigenvectors of $A$, which allows you to expand $$ x = \sum_{k=1}^{n} \sum_{j=1}^{n_k}\langle x,e_{k,j}\rangle e_{k,j} $$ and to then represent $A$ in a simple diagonal form: $$ A x = \sum_{k=1}^{n} \sum_{j=1}^{n_k}\lambda_k \langle x,e_{k,j}\rangle e_{k,j} $$ All of this analysis grew out of Fourier Analysis. Fourier Analysis is not a generalization of such, but this analysis is the specialization of Fourier Analysis ideas to a finite-dimensional setting.
Fourier Analysis on $[-\pi,\pi]$ may be viewed as the eigenfunction analysis of differentiation operator $A = -\frac{d^2}{dx^2}$ on the domain of functions $f$ with two derivatives in $L^2$ and satisfying $f(-\pi)=f(\pi)$, $f'(-\pi)=f'(\pi)$. The operator $A$ is symmetric with respect the integral inner product $$ (f,g) = \int_{-\pi}^{\pi}f(t)\overline{g(t)}dt. $$ That is, $(Af,g)=(f,Ag)$ for all $f,g\in\mathcal{D}(A)$. Because of this, the eigenvalues of $A$ must be real (same argument as above) and the eigenfunctions corresponding to different eigenvalues must be orthogonal. The eigenfunctions are $$ 1,\cos(nx),\sin(nx),\;\;\; n=1,2,3,\cdots. $$ The eigenvalues are $0,1^2,2^2,3^2,\cdots,n^2,\cdots$. For example, $$ L\cos(nx) = n^2 \cos(nx),\;\;\; L\sin(nx) = n^2 \sin(nx). $$ The eigenspace for $n=0$ is one-dimensional with $L 1 = 0 \cdot 1$ (function $1$ and eigenvalue $0$.) The eigenspace for $n\ne 0$ is two-dimension and is spanned by $\sin(nx),\cos(nx)$. It is automatic that $\sin(nx),\cos(nx)$ are mutually orthogonal to $\sin(mx),\cos(mx)$ whenever $n\ne m$. And $\sin(nx),\cos(nx)$ are mutually orthogonal because of how they were chosen (nothing automatic.) The normalized eigenfunctions are \begin{align} e_0 & = \frac{1}{\sqrt{2\pi}}, \\ e_{1,1} & = \frac{1}{\sqrt{\pi}}\cos(x), e_{1,2}=\frac{1}{\sqrt{\pi}}\sin(x), \\ e_{2,1} & =\frac{1}{\sqrt{\pi}}\cos(2x),e_{2,2}=\frac{1}{\sqrt{\pi}}\sin(2x),\;\;\; \\ \vdots & = \vdots \end{align} The orthogonal function expansion of $f\in L^2[-\pi,\pi]$ in these eigenfunctions of $A$ is "the Fourier series" of $f$. This expansion always converges to $f$ in the norm of $L^2[-\pi,\pi]$. This constitutes an orthogonal diagonalization of the symmetric operator $A=-\frac{d^2}{dx^2}$, similar to the matrix case.
Other Sturm-Liouville problems exist for $\frac{d^2}{dx^2}$. For example, $$ Lf = -\frac{d^2}{dx^2}f \\ \cos\alpha f(a)+\sin\alpha f'(a) = 0 \\ \cos\beta f(b) + \sin\beta f'(b) = 0 $$ The domain for $L$ here includes the above separated endpoint conditions. Again there is a discrete set of eigenvalues and all of the eigenspaces are one-dimensional. The eigenfunctions corresponding to these eigenvalues are all one-dimensional, and every $f\in L^2[a,b]$ can be expanded in a "Fourier series" of these orthogonal eigenfunctions. $L$ is symmetric again; so the eignevalues are real and the eigenfunctions corresponding to distinct eigenvalues are mutually orthogonal. These expansions include sine expansions, cosine expansions, and a wealth of other trigonometric function expansions where the eigenvalues are not at all evenly spaced. And all of the resulting Fourier expansions converge in $L^2$ to the original function.
The generalizations needed: Linear space, inner product and orthogonality, norm, approximation in norm and topology, symmetric and selfadjoint operators, orthogonal eigenvector expansions, differential operators, closed densely-defined unbounded operators, spectrum.
I've added a chart of development from Dieudonne's "A History of Functional Analysis" so that you can see how the infinite-dimensional gave rise to the finite-dimensional. This is very upside down, compared to most Math development. The most abstract came first and filtered down for a long before reversing course.