Proof of fundamental theorem of integral calculus

This is the third chapter of my "share your knowledge, Q&A style" trilogy: Spectral Theorem, Weak Compactness of the Closed Unit Ball of a Hilbert Space, and now FTIC. I looked around the Web, and all I found were incomplete proofs, in the sense that they assumed results I never knew of, or proved less strong statements. This is why I am posting this. Let me outline the strategy, and then prove every point.

Prove the easier direction: the integral function of an $L^1$ function is absolutely continuous; this will come in handy in that;
Proving the weaker statement that if $f$ is absolutely continuous and a.e. differentiable, then it is the integral of its derivative;
Proving an absolutely continuous function has bounded variation;
Proving a BV function is the difference of two monotone increasing functions;
Proving a version of the Vitali convering theorem;
Using step 5 to prove a monotone increasing function is a.e. differentiable;
Combining step 6 and step 4 to conclude a BV function is a.e. differentiable, combining this with step 3 to deduce the property for a.c. functions, and using step 2 to conclude the FTIC.

Since this is a huge thing, I will post this in bits.

Step 1: the integral function of an $L^1$ function is absolutely continuous.

If $F(x)=F(a)+\int_a^xf(t)\mathrm{d}t$ with $f\in L^1(a,b)$, then:

$$\sum|F(b_i)-F(a_i)|=\sum\left|\int_{a_i}^{b_i}f(t)\mathrm{d}t\right|\leq\sum\int_{a_i}^{b_i}|f(t)|\mathrm{d}t=\int_{\bigcup[a_i,b_i]}|f(t)|\mathrm{d}t.$$

So if we can prove that for all $\epsilon$ there exists $\delta$ such that $m(A)<\delta$ implies $\int_A|f(t)|\mathrm{d}t$, the result follows. That is where the link comes in. If $f$ is bounded, then $|f|\leq M$, hence $\int_A|f|\leq Mm(A)$, hence set $\delta=\frac\epsilon M$. Otherwise, define $|f|_M(t)=\min\{|f|(t),M\}$. By dominated convergence, $\int_a^b(|f|-|f|_M)\to0$ for $M\to\infty$, hence for sufficiently large $M$ we can make it less than $\frac\epsilon2$, and having fixed that $M$ we set $\delta=\frac\epsilon M$, so that:

$$\int_A|f|=\int_a^b(|f|-|f|_M)+\int_A|f|_M\leq\frac\epsilon2+Mm(A),$$

and $m(A)<\frac\epsilon M$ implies $\int_A|f|\leq\epsilon$. So we are done.

Step 2: an absolutely continuous and almost everywhere differentiable function is an integral function.

For each $n\in\mathbb{N}$ we partition $[a,b]$ into intervals of length $\frac{b-a}{2^n}$ by setting $x_{i,n}=\frac{i}{2^n}(b-a)+a$. Set:

$$h_n(x)=\sum_{i=1}^{2^n}\frac{f(x_{i,n})-f(x_{i-1,n})}{x_{i,n}-x_{i-1,n}}\chi_{[x_{i-1,n},x_{i,n})}.$$

On one hand, since $f$ is a.e. differentiable, and $h_n$ are essentiali incremental ratios on thinner and thinner intervals, $h_n\to f'$ a.e. On the other hand, we see that:

$$\int_a^bh_n(x)\mathrm{d}x=\sum_{i=1}^{2^n}\int_{x_{i-1,n}}^{x_{i,n}}h_n(x)\mathrm{d}x=\sum_{i=1}^{2^n}[f(x_{i,n})-f(x_{i-1,n})]=f(b)-f(a).$$

So all we have to prove is that the limit passes under the integral. We will prove that in fact the convergence is in $L^1$. Fix $\epsilon>0$. $F$ is a.c., so we find $\delta$ such that $\sum(b_i-a_i)<\delta\implies\sum|F(b_i)-F(a_i)|<\frac\epsilon4$. Since $f'\in L^1$, as shown above, we find $\rho$ such that $m(A)<\rho\implies\int_A|f'|<\frac\epsilon4$. We will later prove the following lemma.

Lemma

For each $\epsilon>0$ there exist $k,n_k\in\mathbb{N}$ such that:

$$k\cdot m\left(\left\{x\in I:\sup_{n\geq n_k}|h_n(x)|>k\right\}\right)<\epsilon.$$

We thus choose the $k,n_k$ corresponding to $\min\{\delta,\frac\epsilon4,\rho\}$ in the lemma. Let us call $A$ the set in the lemma corresponding to those $k,n_k$. What said above implies:

\begin{align*} m(A)<{}&\delta, \\ k\cdot m(A)<{}&\frac\epsilon4, \\ \int_A|f'(x)|dx<\frac\epsilon4. \end{align*}

In fact, we have $km(A)<\delta$ by choice of $k,n_k$. The first equation follows from this. The second is the lemma. The third is because $m(A)<\rho$, and by choice of $\rho$. We now remark that:

\begin{align*} \int_a^b|h_n(x)-f'(x)|\mathrm{d}x={}&\int_{I\smallsetminus A}|h_n(x)-f'(x)|\mathrm{d}x+\int_A|h_n-f'(x)|\mathrm{d}x< \\ {}<{}&\int_{I\smallsetminus A}|h_n(x)-f'(x)|\mathrm{d}x+\int_A|h_n(x)|\mathrm{d}x+\frac\epsilon4, \end{align*} where the inequality is the triangular inequality plus the third equation above. By definition of $A$ we have that eventually, i.e. for $n\geq n_k$, $|h_n|\leq k$ for all $x\in I\smallsetminus A$, hence $|h_n-f'|\leq k+|f'|$ on $I\smallsetminus A$ and $n\geq n_k$. Hence, dominated convergence implies that piece tends to zero, so for $n$ big enough it is less than $\frac\epsilon4$. For such a choice of $n$, we have, combining this remark with the inequality above, that:

$$\int_a^b|h_n(x)-f'(x)|\mathrm{d}x<\frac\epsilon2+\int_A|h_n(x)|\mathrm{d}x.$$

Now we split $$ into $B={x\in A:|h_n(x)|\leq k}$ and $C=A\smallsetminus B$, for each fixed $n\geq n_\epsilon$, where $n_\epsilon$ is such that $n\geq n_\epsilon$ implies the above bound on that integral on $I\smallsetminus A$. For the integral over $B$, we have:

$$\int_B|h_n(x)|\mathrm{d}x\leq km(B)\leq km(A)<\frac\epsilon4,$$

by the second equation of the series of three above.If $C=\varnothing$, certainly the integral on $C$ is bounded by $\frac\epsilon4$. We now use absolute continuity of $F$ to prove this actually always holds, and thus conclude this proof. $h_n$ is constant on intervals of the form $[x_{i-1,n},x_{i,n})$, so there exist pairwise distinct indices $i_l$ for $l=1,\dotsc,p$ with $p\leq 2^n$ such that:

$$C=\bigcup_{l=1}^p[x_{i_l-1,n},x_{i_l,n}).$$

Using the first inequality of the series of three above, we get:

$$\sum_{l=1}^p(x_{i_l,n}-x_{i_l-1,n})=m(C)\leq m(A)<\delta,$$

and by choice of $\delta$ from absolute continuity of $F$:

$$\int_C|h_n(x)|\mathrm{d}x=\sum_{l=1}^p\int_{x_{i_k-1,n}}^{x_{i_l},n}|h_n(x)|\mathrm{d}x=\sum_{l=1}^p|f(x_{i_l,n})-f(x_{i_l-1,n})|<\frac\epsilon4.$$

So finally our big integral is estimated by $\epsilon$, for any $\epsilon$, hence it tends to 0.

Proof of the lemma Fix $\epsilon>0$ and choose $\rho>0$ such that $m(E)<\rho\implies\int_E|f'(x)|\mathrm{d}x<\frac\epsilon2$. Let $N\subseteq I$ be such that $h_n\to f'$ pointwise outside $N$. Since $f'\in L^1$, $m(\{x\in I\smallsetminus N:|f'(x)|\geq k\})\to0$ for $k\to\infty$, so we can find $k$ big enough so that $m(\{x\in I\smallsetminus N:|f'(x)|\geq k\})<\rho$. This gives us:

$$km(\{x\in I\smallsetminus N:|f'(x)|\geq k\})\leq\int_{\{x\in I\smallsetminus:|f'(x)|\geq k\}}|f'(x)|\mathrm{d}x<\frac\epsilon2.$$

Let us set:

$$E_j=\left\{x\in I\smallsetminus N:\sup_{n\geq j}|h_n(x)|>k\right\},$$

for all $j\in\mathbb{N}$. $m(E_j)$ clearly tends to the measure of $\bigcap_jE_j$. That set is clearly within $\{x\in I\smallsetminus N:|f'(x)|\geq k\}$, so we can find $n_k$ such that:

$$m(E_{n_k})\leq m(\{x\in I\smallsetminus N:|f'(x)|\geq k\})+\frac{\epsilon}{2k},$$

so we multiply by $k$, use the inequality before the definition of $E_j$, and deduce $km(E_{n_k})<\epsilon$.

Step 3: an absolutely continuous function is of bounded variation.

By absolute continuity we find $\delta>0$ such that $\sum(b_i-a_i)<\delta\implies\sum|f(b_i)-f(a_i)|<1$. Let $N$ be the least integer such that $N>\frac{b-a}{\delta}$, and let $a_j:=a+j\frac{b-a}{N}$ for $j=0,1,\dotsc,N$. It follows that:

$$\bigvee_a^bf=\sum_{j=1}^N\bigvee_{a_{j-1}}^{a_j}f<N.$$

Hence, $f$ is BV. This proves the first equality. I originally thought it should be an inequality, and remarked that anyways this works all the same, then I found this link and convinced myself it is an equality.

Step 4: a BV function is the difference of two monotone increasing functions.

I'm lucky in this step since I have LaTeX code (the source is a math SX answer), so I will just copy-paste, with a blockquote.

Let $f$ a function of bounded variation. Let $F(x):=\sup \sum_{j=1}^{n-1}|f(x_{j+1})-f(x_j)|=:\operatorname{Var}[a,x]$, where the supremum is taken over the $x_1,\ldots,x_n$ which satisfy $a=x_1<x_2<\ldots<x_n=x$. Since $f$ is of bounded variation, $F$ is bounded, and by definition increasing. Let $G:=F-f$. We have to show that $G$ is bounded and increasing. Boundedness follows from this property for $f$ and $F$, now fix $a\leq x_1<x_2\leq b$. We have $$G(x_2)-G(x_1)=F(x_2)-f(x_2)-F(x_1)+f(x_1)\geq 0$$ because $\operatorname{Var}[a,x_1]+f(x_2)-f(x_1)\leq \operatorname{Var}[a,x_1]+|f(x_2)-f(x_1)|\leq \operatorname{Var}[a,x_2]$.

If $f$ and $g$ are of bounded variation so is $f-g$. If $f$ is increasing then we have, if $a=x_0<x_1<\ldots<x_n=b$ that $\sum_{j=1}^{n-1}|f(x_{j+1})-f(x_j)|=|f(b)-f(a)|$, so $f$ is of bounded variation. So the difference of two bounded monotonic increasing functions is of bounded variation.

Remark

This proves, in fact, more than the step I need, since it proves BV implies to difference of monotone increasing functions, but also that the converse holds, provided the two monotone functions are bounded. Thanks Davide Giraudo for this answer.

Step 5: Vitali Covering Theorem (or some version of it)

Definition

If $E\subseteq\mathbb{R}$, I will call a collection $\Gamma$ of closed intervals in $\mathbb{R}$ a Vitali covering of $E$ if for all $\delta>0$ and all $x\in E$ we can find an interval $I\in\Gamma$ such that $x\in I$ and $\ell(I)<\delta$, where $\ell([a,b])=b-a$.

With that, the precise statement I intend to prove now is the following.

Theorem

Let $E\subseteq\mathbb{R}$ have finite Lebesgue outer measure and let $\Gamma$ be a Vitali covering of $E$. Then, for $\epsilon>0$, we can find a finite disjoint collection $\{I_1,\dotsc,I_N\}$ of intervals in $\Gamma$ such that:

$$\lambda^\ast\left(E\smallsetminus\bigcup_{n=1}^NI_n\right)<\epsilon,$$

$\lambda^\ast$ being the Lebesgue outer measure.

To prove this, let $G$ be an open set containing $E$ with finite Lebesgue measure. $\Gamma$ is a Vitali covering, so we may assume $G$ contains the union of $\Gamma$. We now choose a sequence $(I_n)_{n=1,\dotsc}$ of disjoint intervals of $\Gamma$ recursively. We choose first any $I_1\in\Gamma$. Then supposing $I_1,\dotsc,In$ have been defined, we set $k_n$ to be the supremum of the lengths of those intervals of $\Gamma$ which are dijoint from all $I_k$:

$$k_n:=\sup\{\ell(I):I\in\Gamma,I\cap I_k=\varnothing\,\,\forall k=1,\dotsc,n\}.$$

We choose $I_{n+1}$ from $\Gamma$ such that it is disjoint from the $n$ previously chosen intervals and has length $k_n$. Since these intervals we have chosen are all disjoint, their union has measure the sum of their lengths (series, in fact, if they are infinite), so that sum/series is finite because the union is contained in $G$. This implies that $k_n\to0$. Also, since the codas of a convergent seies are infinitesimal, this implies we can find $N>0$ such that:

$$\sum_{n=N+1}^\infty\ell(I_n)<\frac\epsilon5.$$

So if we can prove that the other intervals leave out at most $\epsilon$ from $E$, we have concluded. For this purpose, we set $J_n:=I_n+2\ell(I_n)[-1,1]$, for all $n\in\mathbb{N}$. If we prove these intervals, from $N$ to infinity, cover what the $I_n$'s from 1 to $N$ leave out of $E$, we are done. So let $x\in E\smallsetminus\bigcup_1^NI_n$. $\Gamma$ is a Vitali covering, so we can find $I\in\Gamma$ with $x\in I$ and $I\subseteq G\smallsetminus\bigcup_1^NI_n$. Then $I\cap I_n\neq\varnothing$ for some $n$, otherwise $\ell(I)<k_n$ for all $n$ which contradicts that $k_n\to0$. Let $n_0$ b the smallest integer such that $I\cap I_{n_0}\neq\varnothing$. Then $n_0>N$ and $\ell(I)\leq2\ell(I_{n_0})$. It follows that $I\subseteq J_{n_0}$, as desired.

Step 6: a monotone increasing function is almost everywhere differentiable.

Actually, we prove something more than what we need: a sort of FTIC for monotone functions.

Theorem

An increasing real-valued function $f$ on an interval $[a,b]$ is differentiable almost everywhere. Its derivative $f'$ is measurable and:

$$\int_a^bf'(x)\mathrm{d}x\leq f(b)-f(a).$$

We set:

\begin{align*} D^+f(x)={}&\limsup_{h\to0^+}\frac{f(x+h)-f(x)}{h} & D^-f(x)={}&\limsup_{h\to0^-}\frac{f(x+h)-f(x)}{h} \\ D_+f(x)={}&\liminf_{h\to0^+}\frac{f(x+h)-f(x)}{h} & D_-f(x)={}&\liminf_{h\to0^-}\frac{f(x+h)-f(x)}{h}. \end{align*}

So high sign, limsup; low sign, liminf; + sign, $h\to0^+$; - sign, $h\to0^-$. e further set:

$$A=\{x\in[a,b]:D^+f(x)>D_-f(x)\} \qquad B=\{x\in[a,b]:D^-f(x)>D_+f(x)\}.$$

For any $x$, we have $D_-f(x)\leq D^-f(x)$ and $D_+f(x)\leq D^+f(x)$. If in addition $x\notin A\cup B$, we have:

$$D^+f(x)\leq D_-f(x)\leq D^-f(x)\leq D_+f(x)\leq D^+f(x),$$

implying they are all equal, and hence $f'(x)$ exists. So if we show $A$ and $B$ have measure 0, $f$ is differentiable almost everywhere. We work for $A$, and $B$ is dealt with in much the same way. Set:

$$A_{s,t}=\{x\in[a,b]:D^+f(x)>s>t>D_-f(x)\}.$$

Clearly we have:

$$A=\bigcup_{\substack{s>t \\ s,t\in\mathbb{Q}}}A_{s,t},$$

and that is a countable union, so if we prove all those sets have measure zero, then we are done. By definition of $D_-f(x)$, for all $x\in A_{s,t}$ there exists an arbitrary small interval $[x-h,x]$ contained in $O$ with $f(x)-f(x-h)<th$. The collection of such intervals is a Vitali covering of $A_{s,t}$. By step 5, we can find disjoint intervals $I_1,\dotsc,I_M$ in finite number such that:

$$\lambda^\ast\left(A_{s,t}\smallsetminus\bigcup_{j=1}^MI_j\right)<\epsilon.$$

Let us say $I_j=[x_j-j_j,x_j]$ for all $j=1,\dotsc,M$. Then we have:

$$\sum_{j=1}^M[f(x_j)-f(x_j-h_j)]<t\sum_{j=1}^Mh_j<t\lambda(O)<t(a+\epsilon).$$

Let:

$$\[G=A_{s,t}\cap\left(\bigcap_{j=1}^M(x_j-h_j,x_j)\right).$$

By definition of $D^+f(x)$, for each $y\in G$ there exists an arbitrary small interval $[y,y+k]$ contained in some $I_j$ such that $f(y+k)-f(y)>sk$. Again, by step 5 there exists a finite disjoint collection of such intervals $\{J_1,\dotsc,J_K\}$ such that:

$$\lambda^\ast\left(G\smallsetminus\bigcup_{i=1}^K\right)<\epsilon.$$

It follows that:

$$\lambda^\ast\left(\bigcup_{i=1}^KJ_i\right)>\lambda^\ast(G)-\epsilon.$$

But $A_:{s,t}\smallsetminus G=A_{s,t}\smallsetminus\bigcup_1MI_j$. Hence:

$$\lambda^\ast(A_{s,t})\leq\lambda^\ast(A_{s,t}\smallsetminus G)+\lambda^\ast(G)=\lambda^\ast(G)+\lambda^\ast\left(A_{s,t}\smallsetminus\bigcup_{j=1}I_j\right)<\lambda^\ast(G)+\epsilon.$$

Consequently:

$$\lambda^\ast\left(\bigcup_{i=1}^KJ_i\right)>\lambda^\ast(G)-\epsilon>\lambda^\ast(A_{s,t})-2\epsilon=a-2\epsilon.$$

Now suppose $J_i=[y_i,y_i+k_i]$ for all $i=1,\dotsc,K$. Each $J_i$ was chosen contained in $I_j$ for some $j$. If we sum over those $i$ for which $J_i\subseteq I_j$, we find: [\sum_{J_i\subseteq I_j}[f(y_i+k_i)-f(y_i)]\leq f(x_j)-f(x_j-h_j),] because $f$ is increasing. Hence:

$$s(a-2\epsilon)<s\sum_{i=1}^Kk_i<\sum_{i=1}^K[f(y_i+k_i)-f(y_i)]\leq\sum_{j=1}^M[f(x_j)-f(x_j-h_j)]<t(a+\epsilon).$$

Summing up, for all $\epsilon$ we have:

$$s(a-2\epsilon)<t(a+\epsilon),$$

which means $sa\leq ta$. But if $a>0$ we divide and get $s<t$, a contradiction by choice of $s,t$. $a<0$ is not allowed since it is a measure. Hence $a=0$, as desired.

Now we have $\frac{f(x+h)-f(x)}{h}$ has a limit for almost every $x$. We define $g(x)$ to be that limit where it exists, and 0 elsewhere. Set $f(x)=f(b)$ for $x>b$ and define:

$$g_n(x)=n\left[f\left(x+\frac1n\right)-f(x)\right],$$

for $a\leq x\leq b$. Each $g_n$ is nonnegative since $f$ is increasing, and $g_n$ converges to $f'$ almost everywhere. Also:

$$\int_a^bg_n(x)\mathrm{d}x=n\left[\int_b^{b+\frac1n}f(x)\mathrm{d}x-\int_a^{a+\frac1n}f(x)\mathrm{d}x\right]\leq f(b)-f(a).$$

By Fatou's lemma:

$$\int_a^bf'(x)\mathrm{d}x\leq\liminf_{n\to\infty}\int_a^bg_n(x)\mathrm{d}x\leq f(b)-f(a),$$

which completes our proof.

Step 7: conclusion.

Step 6 tells us a monotone increasing function is almost everywhere differentiable (and a little extra). But step 4 says a BV function $f$ is $g-h$ with $g,h$ monotone increasing. $g$ will be differentiable outside $N_g$, and $h$ outside $N_h$, both zero-measure sets. Hence, their union has measure zero, and outside that union they are both differentiable, which by linearity of the derivative implies $f$ is. So a BV function is a.e.d.. But an a.c. function is BV, hence a.e.d., by step 3. Finally, we can say step 2 does not need the a.e. differentiability hypothesis, which proves the other direction of our statement.

Remarks

My original strategy was with the same first three steps, but then I planned to establish the Simple Vitali Lemma found here on pp. 3-5, taking the definitions of p. 27 (Lebesgue set) and 31 (Regularly shrinking sets) of the same document to plug them into the proof of the theorem at the end (pp. 35-38), to finally prove the monotone case as is done here (Theorem 24, pp. 9-10), and then prove a BV function is the difference of two monotone functions. However, this is rather longer than what I did above, so thanks @Chilango for that reference, it shortened my work (and my post) by a significant amount.
The contents of those documents are anyway pretty interesting.
As are those of these two: one and two, which probably have much in common with the other two. So these are a couple of extra references for the curious. And finally this post is over.
In particular, those two references hide the proof of the Lebesgue differentiation theorem, stating that if $f$ is an integrable (i.e. $L^1$) function, then:

$$\lim_{r\to0}\frac{1}{|B_r(x)|}\int_{B_r}f\mathrm{d}\mu=f(x),$$

for a.e. $x$. In particular, for single-variable functions, this means an integral function is a.e. differentiable and its derivative is a.e. equal to the function it is an integral of, if said function is $L^1$.
To establish that, a somewhat more sophisticated version of the Vitali theorem above is proved, one that seems much like the Simple Vitali Lemma of my older reference;
The Hardy-Littlewood theorem is also proved, a theorem giving an estimate related to the "maximal function" of a function;
Lastly, on p. 13 of this reference, the first of the two, while all the rest of what I just mentioned is in the second one, the density of $\mathcal{C}_c$ in $L^1$ is proved; this is needed for the differentiation theorem, to approximate with continuous functions, where the maximal function is $0$.