How would you discover Stokes's theorem?

Let $S$ be a smooth oriented surface in $\mathbb R^3$ with boundary $C$, and let $f: \mathbb R^3 \to \mathbb R^3$ be a continuously differentiable vector field on $\mathbb R^3$.

enter image description here

Stokes's theorem states that $$ \int_C f \cdot dr = \int_S (\nabla \times f) \cdot dA. $$ In other words, the line integral of $f$ over the curve $C$ is equal to the integral of the curl of $f$ over the surface $S$. Here the orientation of the boundary $C$ is induced by the orientation of $S$.

Question: How might somebody have derived or discovered this formula? Where does this formula come from?

The goal is to provide an intuitive explanation of Stokes's theorem, rather than a rigorous proof.

(I'll post an answer below.)


Here's an intuitive way to discover Stokes's theorem.

Imagine chopping up the surface $S$ into tiny pieces such that each tiny piece is a parallelogram (or at least, each tiny piece is approximately a parallelogram).

enter image description here

Let $C_i$ be the boundary of the $i$th tiny parallelogram. I'll assume each $C_i$ has the orientation induced by the orientation of $S$. Notice that $$ \tag{1} \sum_i \int_{C_i} f \cdot dr = \int_C f \cdot dr. $$ This is because the sum on the left "telescopes". Everything in the middle cancels out and we are left only with boundary terms. This beautiful step in the derivation is reminiscent of the telescoping sum that appears when deriving the fundamental theorem of calculus in single variable calculus.

To complete our derivation of Stokes's theorem, we must compute the integral of $f$ around the boundary of a tiny parallelogram. Below is a picture of one single tiny parallelogram which is based at a point $x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \in \mathbb R^3$ and which is spanned by vectors $v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}$ and $w = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} \in \mathbb R^3$. The orientation of the boundary of the parallelogram is indicated by the little direction arrows.

enter image description here

Since this is a very tiny parallelogram, I'll make the approximation that the integral of $f$ along edge 1 is approximately $f(x) \cdot v$, the integral of $f$ along edge 2 is approximately $f(x + v) \cdot w$, the integral of $f$ along edge 3 is approximately $f(x + w) \cdot (-v)$, and the integral of $f$ along edge 4 is approximately $f(x) \cdot (-w)$. Summing these four terms, and pairing edge 1 with edge 3 and edge 2 with edge 4, we find that the integral of $f$ along the boundary of this parallelogram is approximately \begin{align*} &\quad \langle f(x+v) - f(x), w \rangle - \langle f(x + w) - f(x), v \rangle \\ &\approx \langle f'(x) v, w \rangle - \langle f'(x) w, v \rangle \\ &= \langle v, (f'(x)^T - f'(x)) w \rangle \\ &= \left \langle v, \begin{bmatrix} 0 & \frac{\partial f_2(x)}{\partial x_1} - \frac{\partial f_1(x)}{\partial x_2} & \frac{\partial f_3(x)}{\partial x_1} - \frac{\partial f_1(x)}{\partial x_3} \\ \frac{\partial f_1(x)}{\partial x_2} - \frac{\partial f_2(x)}{\partial x_1} & 0 & \frac{\partial f_3(x)}{\partial x_2} - \frac{\partial f_2(x)}{\partial x_3} \\ \frac{\partial f_1(x)}{\partial x_3} - \frac{\partial f_3(x)}{\partial x_1} & \frac{\partial f_2(x)}{\partial x_3} - \frac{\partial f_3(x)}{\partial x_2} & 0 \end{bmatrix} w \right\rangle \\ &= \langle v, w \times (\nabla \times f) \rangle \\ &=\tag{2} \langle \nabla \times f, \underbrace{v \times w}_{\substack{\text{Area vector}\\ \text{for this tiny} \\ \text{parallelogram}}} \rangle. \end{align*}

Here $f_1, f_2$, and $f_3$ are the component functions of $f$ and $ f'(x) = \begin{bmatrix} \frac{\partial f_1(x)}{\partial x_1} & \frac{\partial f_1(x)}{\partial x_2} & \frac{\partial f_1(x)}{\partial x_3} \\ \frac{\partial f_2(x)}{\partial x_1} & \frac{\partial f_2(x)}{\partial x_2} & \frac{\partial f_2(x)}{\partial x_3} \\ \frac{\partial f_3(x)}{\partial x_1} & \frac{\partial f_3(x)}{\partial x_2} & \frac{\partial f_3(x)}{\partial x_3} \\ \end{bmatrix} $ is the Jacobian matrix of $f$ at $x$. The vector $\nabla \times f$, which is called the "curl" of $f$ at $x$, is defined by $$ \nabla \times f = \begin{bmatrix} \frac{\partial f_3(x)}{\partial x_2} - \frac{\partial f_2(x)}{\partial x_3} \\ \frac{\partial f_1(x)}{\partial x_3} - \frac{\partial f_3(x)}{\partial x_1} \\ \frac{\partial f_2(x)}{\partial x_1} - \frac{\partial f_1(x)}{\partial x_2} \end{bmatrix}. $$ This is the moment in math when we discover the curl for the first time. Technically, I should write the curl of $f$ at $x$ as $(\nabla \times f)(x)$.

The final step in our derivation of Stokes's theorem is to apply formula (2) to the sum on the left in equation (1). Let $\Delta A_i$ be the "area vector" for the $i$th tiny parallelogram. In other words, the vector $\Delta A_i$ points outwards, and the magnitude of $\Delta A_i$ is equal to the area of the $i$th tiny parallelogram. Let $x^i \in \mathbb R^3$ be the point where the $i$th tiny parallelogram is based. (The $i$ here is a superscript, not an exponent.) Combining formulas (1) and (2) reveals that \begin{align} \int_C f \cdot dr &\approx \sum_i (\nabla \times f)(x_i) \cdot \Delta A_i \\ &\approx \int_S (\nabla \times f) \cdot dA. \end{align} We have discovered the Stokes's theorem formula. It seems plausible that we can make the approximation as accurate as we like by chopping up $S$ into sufficiently small pieces. Thus, we conclude that $$ \int_C f \cdot dr = \int_S (\nabla \times f) \cdot dA $$


Comments:

  • I gave a similar derivation of Green's theorem here.

  • Physicists frequently use similar arguments when deriving Stokes's theorem. Feynman, for example, integrates a vector field around a little square in the $xy$-plane, then recognizes that the result can be expressed in terms of the curl vector. Here is the relevant passage from Feynman: enter image description here However, how did Feynman discover the curl in the first place? He did it by treating the gradient operator $\nabla$ as a vector, and symbolically computing the cross product of this "vector" with $f$. I find that to be interesting and characteristically Feynman, but I also want a more direct way to discover Stokes's theorem, the same way that we discovered Green's theorem. (See section 3-6 and section 2-5 of volume II of the Feynman Lectures on Physics for reference.)

    The book Div, Grad, Curl and All That computes the three components of the curl vector by integrating a vector field around small rectangles which are parallel to either the $xy$-plane or the $xz$-plane or the $yz$-plane. The author remarks, "It turns out that these three quantities are the Cartesian components of a vector. To this vector we give the name 'curl of $\mathbf F$,' which we write $\text{curl } \mathbf F$." In other words, now paraphrasing and switching to my notation, they assume the existence of a vector $(\nabla \times f)(x)$ which satisfies $$ (\nabla \times f)(x) \cdot \Delta A \approx \int_{\partial E} f \cdot dr $$ for any tiny planar surface $E$ containing $x$ with area vector $\Delta A$. By considering the special cases where $E$ is a rectangle and $\Delta A$ is parallel to either $\hat i$ or $\hat j$ or $\hat k$, they derive the components of $(\nabla \times f)(x)$. Here is the relevant passage: enter image description here

  • When deriving Green's theorem and the divergence theorem, physicists typically chop up the region that we are integrating over into tiny rectangles or tiny boxes. I think the most clear and elegant way to make this strategy work for Stokes's theorem is to chop up $S$ into tiny parallelograms. In fact, I think we should also use parallelograms or parallelepipeds when deriving Green's theorem and the divergence theorem. This strategy can even be used to derive the generalized Stokes's theorem and to discover the exterior derivative (by chopping up a smooth manifold into tiny parallelepipeds).

  • One way to chop up $S$ into tiny parallelograms is to start with a rectangular region $R$ that is chopped up into tiny rectangles, then smoothly morph $R$ onto $S$. If $S$ is not diffeomorphic to a rectangular region, then $S$ can at least be broken into simpler pieces, each of which is diffeomorphic to a rectangular region.

  • When deriving equation (2), I used the first-order Taylor approximation $$ \tag{3} f(x + v) - f(x) \approx f'(x) v. $$ The approximation is good when $v$ is small. The Jacobian matrix $f'(x)$ is also called the derivative of $f$ at $x$. The approximation (3), which Terence Tao refers to as "Newton's approximation", is the key idea of calculus. It is essentially the definition of $f'(x)$. The fundamental strategy of calculus is to take a nonlinear function $f$ (difficult) and approximate it locally by a linear function (easy). When deriving the formulas of calculus, we always find that we use the approximation (3) at the crucial moment.

  • It would also be ok to evaluate $f$ at the midpoints of the edges when approximating the integral of $f$ along each edge of the tiny parallelogram. So the integral of $f$ along edge 1 is approximately $f(x + v/2) \cdot v$, the integral of $f$ along edge 2 is approximately $f(x + v + w/2) \cdot w$, etc. These are typically more accurate approximations and the calculation works out equally nicely. However, since our goal is just to provide an intuitive derivation of Stokes's theorem, we might as well keep the calculation as simple as possible.


For me discovering is different from proving or deriving which seems to be the focus of some of the other responses. Of course - what is obvious is subjective. But, for my part I can say how it happened for me. That is, before I heard of Stoke's theorem, I was on my way to a similar conclusion, and the following is why.

First, I knew of the fundamental theorem of calculus of real functions of a real value. And I also had some idea that if something is flowing then the amount of it that passes out through a boundary has to give the chance in the amount inside. That means that there is a boundary integral that relates to a rate of change. If you think about the fundamental theorem of real function you can see the square bracket notation - the difference in the value of a function at the end points - as a kind of directed boundary integral. Integral and sum are clearly very closely related.

Now from a theory of flow one has that $-\nabla\dot{}f$ is the rate of change and $\hat{n} \cdot f$ is the flow through the boundary - where $f$ is the flow and $\hat{n}$ is the normal unit vector. So what we have is a interior integral of some kind of derivative of a function is equal to a boundary integral of some operator on the original function.

At this point I got a bit perplexed because the boundary integral was not the function but a projection of it. And there is an indefinite number of differential operators that could be involved. Then, it dawned on me that the relation was not really about $f$, but about $-\nabla\cdot$ and $\hat{n}\cdot$.

So, if we say that $-\nabla\cdot$ is the derivative of the operator $\hat{n}\cdot$ in some sense, then we have the following thought: the boundary operator and the differential operator can be exchanged inside an integral. And that is one way of looking at Stoke's theorem.

[Yes, details skipped, but this is intended to be about discovery].


A first simplification is that in Stokes' theorem, everything is happening on the surface. So in a certain sense it is a 2D theorem, not a 3D theorem. If we restrict to the parametrisation on the surface, Stokes' theorem is precisely Green's theorem.

So: how would you discover Green's theorem?

Let us start with a more intuitive theorem: the divergence theorem. The divergence theorem says that if you have a steady flow of water with a bunch of sources and sinks, then for any volume $V$ we pick, the net amount of water created inside $V$ is equal to the net amount water flowing through the surface of $V$.

I hope the idea behind this theorem is intuitive enough, even though translating it into formal language is another matter.

Notice now that there surely is a 2D version of the divergence theorem, where all the flowing of the water is happening in the plane. That theorem says that the net source in an area is equal to the net flow through the boundary curve of the area.

As it turns out, this is precisely Green's theorem. This is very unclear the way it is usually presented, where Green's theorem says something like "the amount of swirly in an area is equal to the line integral along the boundary", whatever that means.

Fortunately, it becomes much clearer when we rotate each vector of the vector field by $90$ degrees. Rotating the vector field by $90$ degrees turns every bit of swirly into a bit of divergence! And it turns the line integral along the curve into the flow through the curve, a much more intuitive concept.

If we look at the statement it becomes more clear. The usual statement is if you have a vector field $(L,M)$ then we have

$$\oint Ldx + Mdy = \iint \left(\frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}\right)\,dxdy$$

Notice that rotating by $90$ degrees gives $(L,M) \mapsto (-M,L)$ so we get

$$\oint Ldy - Mdx = \iint \left(\frac{\partial M}{\partial x} + \frac{\partial L}{\partial y}\right)\,dxdy$$

We see the 2D divergence $\frac{\partial M}{\partial x} + \frac{\partial L}{\partial y}$ on the right hand side, and on the left we see $$Ldy - Mdx = (L,M) \cdot (dy,-dx) = (L,M) \cdot \hat{n},$$ i.e. the flow through the boundary.