Geometric intuition of graph Laplacian matrices
This is to satisfy the request of @Surb which I quote as :
Providing intuition on the construction and possible uses of the graph-Laplacian with layman terms would be great. Ideally the discussion is supported with a simple example.
(Note : This usually disappears from the post following bounty expiration, hence I include it).
Introduction
Let me first define the graph Laplacian : given a simple finite graph $G = (V = \{v_i\},E)$, let $A$ denote the adjacency matrix of $G$ and $D$ the degree matrix i.e. the diagonal matrix with the $i$th diagonal entry as $\deg(v_i)$. The matrix $L = D-A$ is called the Laplacian matrix of the graph $G$.
There are many variants of the Laplacian, but these can all be expressed in terms of $D$ and $A$ and play purposes that , at a high level, are analogous to that of $L$.
We begin with some preliminary observations about $L$ : it's a symmetric matrix, and furthermore , one can verify that $L = MM^T$ where $M$ is an $|E| \times |V|$ matrix given by $M_{(i,j),i} = 1$, $M_{(i,j),j} = -1$ and $M_{(i,j),k} = 0$ for $i,j,k \in V$ distinct. Hence, $L$ is in fact positive semidefinite, and hence admits only non-negative real eigenvalues. All row sums of $L$ are the same and equal to $0$, hence $L$ admits the eigenvector $[1,1,...,1]$ with eigenvalue $0$.
We will now talk about the geometric properties of $L$, and link these to the probabilistic aspect of $L$.
The geometry of the graph, and $L$
The simplest thing that one can find from $L$ is the number of connected components of the graph $G$.
Result : The geometric multiplicity of $0$ as an eigenvalue of $L$ (which we know to be positive) equals the number of connected components of $G$.
Proof : Suppose that $Lw = 0$. Then, $(D-A)w = 0$, so in particular $Dw = Aw$ i.e. for all $i$ we have $\deg(v_i)w_i = \sum_{v_j \sim v_i} w_j$. Suppose that $i^*$ satisfies $w_{i^*} = \max_{1 \leq i \leq n} w_i$, then $$ \deg(v_{i^*})w_{i^*} = \sum_{v_j \sim v_{i^*}} w_{i^*} \geq \sum_{v_j \sim v_{i^*}} w_j $$
where the inequality will be strict unless $w_{j} = w_{i^*}$ for all $j$ such that $v_j \sim v_{i^*}$. From this, it follows that $w$ is constant on any connected component of $G$, and therefore $w$ is a linear combination of vectors of the form $1_C$ ,where $C$ is a connected component of $G$. This instantly furnishes the proof. $\blacksquare$
Brief explanations
Let me now start with the layman explanations. When we say the "geometry" of a graph, what exactly are we trying to talk about? Let's try to write down some geometric characteristics. We have the diameter of the graph, the furthest distance between two vertices. We have , in some sense, a sparsity parameter, which tells you how quickly you can get from an average vertex to another vertex. Similarly, you'd consider the average degree to be a feature of the geometry of the graph.
What does the Laplacian do? Well, the Laplacian can do a lot of things, which I'll mention in brief and then go on to elaborate :
-
Following some normalization, the Laplacian yields a transition kernel, which is a probability transition function that allows us to introduce a random walk on the graph. The random walk provides us with means to explore the graph : averaging the exploration then yields averages for critical quantities such as the average degree, average distance between points and so on, and using the variance of this random process, one can easily provide concentration bounds using Chebyshev's inequality as well.
-
The eigenvectors of the Laplacian, when as results of certain minimax problems, yield information about the connectivity of the graph. The basic reason is that such functionals consist of direct comparison between adjacent vertices, as I'll explain later.
-
Continuous surfaces such as manifolds can be discretely approximated by graph-like structures (think of a cycle in $n$ vertices approximating a circle for large $n$). This allows us to think about continuous geometric features, such as the curvature, the area/volume, and so on. It turns out that doing this, for a large class of manifolds, is actually a very important development in mathematics.
Random walk
The (unweighted) random walk on a graph is created by a Markov chain on the vertices of the graph, and the transition is given by $L_{rw} = D^{-1}L - I$. You can check that $(L_{rw})_{ij} = \frac 1{\deg(v_i)}$ if $v_i \sim v_j$ and $0$ otherwise. What this basically means is : if I'm at a vertex, I look at its neighbours and give each of them an equal chance to be my next destination. Thus, I walk around the graph in this fashion.
The question is , what exactly can this random walk tell me? Let's think about a few scenarios, and ask ourselves what these scenarios occurring in particular, tell us about the graph.
-
I'm finding that I'm visiting the same vertex , or set of vertices again and again. Perhaps the connectivity of this part of the graph is poor : it's an area which is difficult to get out of.
-
I'm finding that I'm visiting a lot of new vertices : this is reflective of excellent connectivity in the region, that a lot of vertices in this area have a large degree and connect to each other often.
The average degree of the graph is reflected in $L$ itself, since the trace of $L$ is the sum of degrees of the graph (the diagonal entries are the degrees!). Therefore, the trace of $L$, divided by the number of vertices, gives the average degree.
However, this particular scenario is what we would describe as "bottlenecks" in the graph. A heavily connected graph has no "bottlenecks" or subsets of the graph where a random walk that gets in will find very difficult to come out of, whereas a poorly connected graph will have such issues.
It turns out that this "bottleneck" scenario is captured beautifully by Cheeger's inequality. Without going into too many details, Cheeger's constant is defined to capture the worst bottleneck in the graph : it is given by $$C = \min_{S \subset V} \frac{|E(S,S^c)|}{\min(\sum_{v \in S} \deg(v), \sum_{v \in S^c} \deg(v))} \quad ; \quad E(S,S^c) = \{(v,v') \in E, v \in S , v' \in S^c\}$$
So the $S$ for which this minimum is attained is the worst one to get out of. It turns out that the smallest positive eigenvalue of the Laplacian can be used to provide tight bounds on $C$ (in fact, it turns out that higher eigenvalues will also come into play for even tighter bounds, but knowing this much is nice). The nice probabilistic way of proving this, is to actually randomly walk through the graph, collect data on where you've been and what's been the hardest part to navigate, and use your findings to construct the bottleneck $S$. This is briefly explained here.
In general, using a random walk to decipher properties of a graph is part of the more widely used "probabilistic method". The overarching principle of the probabilistic method is simple :
Sometimes, the best or worst parameter optimizing a quantity is well-approximated by randomly choosing the parameter.
Now, there's another thing that random walks can tell you. Let's say you're performing a random walk on a graph that looks like this : $$ 1- 2 -3 $$
You will, very soon, be realizing that as you walk randomly on this graph, you are likely to be at $2$ with probability $\frac 12$ , and $1$ or $3$ with probability $\frac 14$ each. Imagine I was blindfolded and walked $100$ steps randomly across this graph, and then asked "where am I?" then I'd say the above probability distribution is very accurate, right?
Now, let me ask myself two questions :
-
How did I get $\frac 14 , \frac 12 , \frac 14$? More precisely, what are the different proportions of time I spend at a vertex, while randomly walking across a graph?
-
The mixing question : suppose I start from a given point, and walk randomly across the graph. How many steps do I need to take, to assume safely that the number of times I've visited each vertex looks very, very close to the different proportions I created earlier?
The answer to both questions is given by the random walk Laplacian. Indeed, the answer to the first is the unique left eigenvector of the random walk Laplacian, corresponding to the eigenvalue $1$ whose entries add up to $1$ i.e. it is the unique $w$ such that $wL_{rw} = w$ and $\sum w_i = 1$.
The answer to the second question is given by a result, that states that with a rate depending upon the smallest positive eigenvalue, my confidence (loosely used word : accurately speaking, this is the distance between two probability distributions) in the guess that my proportion of time spent thus far at each vertex is very close to the answer to the first question increases exponentially with the number of steps I take.
Which is a probabilistic representation of the fact that my exploration and understanding of a graphical structure's geometry gets exponentially better, with the exact rate given by a Laplacian eigenvalue. That should do the probabilistic method some justice.
Understanding the geometry of $G$ using minimax-characterizations of the eigenvalues
It turns out that the speciality of $L$ means that the eigenvalues of $L$ (ordered as $0 = \lambda_0 \leq \lambda_1 \leq ... \leq \lambda_{n-1}$, where $n = |V|$) admit minimax characterizations like so : $$ \lambda_i = \inf_{\dim M=i+1} \max_{f \in M} \frac{\sum_{u \sim v} (f(u)-f(v))^2}{\sum_{x} f(x)^2\deg(x)} $$ where $M$ varies over subspaces of $W = \{f : V \to \mathbb R\}$, a vector space of dimension $|V|$. Taking this for $\lambda_0$ and using the subspace of constant functions gives $0$. For $\lambda_1$, we can add to the constant function any other non-constant function and get a subspace of dimension $2$. Since the constant part doesn't contribute, we are led to this characterization : $$ \lambda_1 = \inf_{f \in W , \sum_{x\in V} f(x) = 0} \frac{\sum_{y \sim x} (f(y) - f(x))^2}{\sum_x f(x)^2 \deg(x)} $$
Thus, $\lambda_1$ is determined by the function which reduces local variation the most, because we want for $y \sim x$, $f(y)-f(x)$ to be small in absolute value, in order for this to be minimized.
This makes $\lambda_1$ the ideal candidate to study globalized properties of the graph : whether it's the worst bottleneck or the mixing time or anything that involves the graph to be considered in its entirety, $\lambda_1$ gets involved. In particular, $\lambda_1$ is expected to be quite stable under local perturbations of the graph : if I pinch the graph at a few points or put up a few edges here and there at a few places, don't expect $\lambda_1$ to be changing all that much.
A nice bound for the diameter of a graph, for example, is given by $D \geq \frac 1{\lambda_1 \sum_{x \in X} \deg(x)}$.
On the other hand, as you go to $\lambda_2,\lambda_3$ and so on, the finer effects of the graph start getting captured, which are used for giving tighter bounds for quantities based on more local changes. For example, it is quite easy to see that $D$ equals the smallest $m$ for which the matrix $L_{rw}^m$ has no non-zero off-diagonal entries. The elementary bound is proven using this fact, and can be used to sharpen it (a bound involving $\lambda_1$ and $\lambda_{n-1}$ is known).
Thus, the geometry of the graph is affected by the minimax characterization of the eigenvalues. Furthermore, note that the eigenvectors can be obtained by the minimax characterization and actually capture the variation that this eigenvalue induces. An eigenvector with somewhat controlled entries (all entries being close to each other) points to great global behaviour, while eigenvectors with wild entries point, depending upon the index of the eigenvalue , to high local variations in the geometry of the graph.
Continuous approximation
This one is actually pretty surprising. Let's say that you have a continuous surface, which has some continuous characteristics associated to it, like a volume/area, a curvature at each point of the surface, and so on.
A remarkable result somewhere in the $90$s showed that if were to discretize this surface by creating a graph that sort of "meshes" or looks like it, and looked at the eigenvalues of the Laplacian of that graph, then as the mesh gets finer and finer and starts to resemble the surface more and more, the eigenvectors and eigenvalues converge to the eigenvalues and eigenvectors of the Laplacian of the surface.
This rather astounding result, means that we can talk about the geometry of certain kinds of approximating graphs, by talking about the surface that they resemble! For example, a circle has constant curvature, and one can find the eigenvalues of the Laplacian of the cycle graph as shown here. One will be able to see the resemblance between this, and the kernel of the Laplace-Beltrami operator on the circle : both are periodic, for example, and both contain $\sin$-$\cos$ type signature.
What this means, is that for a graph, an eigenvector could very well assist in approximating the curvature , diameter and area of the surface it attempts to approximate. This also lets us make informed guesses.
For example, one can try to solve the Laplace-Beltrami equation on the unit circle. Now, the area of this shape is $\pi$, it's diameter is $2$, and it's curvature everywhere is some fixed constant number (I've actually forgotten what it is). The important thing, though, is that once you do this, you've got excellent estimates for the eigenvalues and the eigenvectors, of a cycle graph for large $n$! The reverse also holds.
So this is something very interesting : a sort of continuous-to-discrete connection. Some results include :
-
Isoperimetry : If a graph is "isoperimetric" i.e. for all subsets $S \subset V$, the "perimeter" $|E(S,S^c)|$ of $S$ is at least a function comparable (this depends upon the dimension) to the size of $S$ which is $\sum_{x \in S}\deg(x)$, then the same holds for the shape it's approximating i.e. for all subsets $A$ of that shape, the perimeter of $A$ compares to the area of $A$ in the same way.
-
Curvature-type results : If a graph has heavily constrained eigenvectors at the higher eigenvalues (so local effects are extremely minimized) then the surface it's approximating will also have extremely uniform curvature. If you imagine something having a really high curvature though, it will eventually bend back onto itself quicker, and thus have a smaller area. Therefore, a corollary of this result is that a graph with heavily constrained eigenvectors at the higher eigenvalues, helps bound the area of the shape it is approximating (and the diameter of the shape as well).
But , why is all this happening?
Well, it's all happening because of Fourier. Or, to be more precise, his expansion.
The point is that any function $f$ on a (not necessarily finite) graph's vertices, which is square-integrable with respect to the measure that assigns to each point it's degree as a weight, can be expanded as a weighted (possibly infinite) sum of the eigenvectors of the Laplacian matrix. Thus, these weights determine how much each eigenvalue affects $f$, and therefore talks about how globally or locally affected $f$ is, as a function.
It turns out that whether it's the diameter, or the Cheeger constant, or anything geometric that you can possibly think of, it can be captured by some function $f$ , and then studied using the Fourier expansion of that $f$. Then, the inequalities that are derived, are done so using truncations or bounds on the Fourier expansion. That , however, connects the geometry of the graph, to the analytic properties of the eigenvectors of the graph.
Furthermore, this also extends to surfaces, where you have square-integrable functions with respect to some measure, and a Fourier basis corresponding to this. The probabilistic method there is not really very different : instead of running between vertices of a graph, you're running across a surface and seeing where you're going. So the resemblance is remarkable, and this should hopefully have been a decent demonstration of it.