Why this definition of integrability by sup and inf integrals is necessary?

I'm learning the part about integrals in analysis, and it starts basically throwing things at my face without even showing a motivation for it. It starts with:

$m=\inf\{f(x), x\in [a,b]\}$

and

$M = \sup\{f(x), x\in[a,b]\}$

Of course we have $m<f(x)<M$ for all $x\in [a,b]$.

Then, it defines a partition $P$ of $[a,b]$ which is $\{t_0, t_1, \cdots, t_n\}$ and creates the notation $m_i=\inf\{f(x), t_{i-1}\le x\le t_i\}$, $M_i=\sup\{f(x), \le t_{i-1}x\le t_i\}$. Then it defines $w_i = M_i -m_i$.

After it, the book defines the superior and inferior sums:

$s(f, P) = m_i(t_1-t_0)+\cdots +m_n(t_n-t_{n-1})$

$S(f, P) = M_i(t_1-t_0)+\cdots +M_n(t_n-t_{n-1})$

I can understand these sums, one is the area below the curve, and the other is the area above the curve, or something like this (not a rigorous explanation).

Then it defines the inferior and superior integrals like this:

$$superior\int_{a}^{b} f(x)dx = \sup s(f,P)$$

$$inferior\int_{a}^{b} f(x)dx = \inf S(f,P)$$

I understand that, for the area below the curve, it's getting the greatest area below the curve, and for the area above the curve, it's getting the smallest of them. This is a really good approximation, if not, the integral itself.

Now, I have problems with the following things:

It says that if we have a 'thinner' partition (don't know the term in english), that is, $P\subset Q$, we have: $s(f,P)\le s(f,Q)$ and $S(f,Q)\le S(f,P)$.

Then it proves that for any partitons $P$ and $Q$ of $[a,b]$ we have $s(f,P)\le S(f,Q)$ and uses it to later prove:

$$m(b-a)\le superior \int_{a}^{b}f(x)dx\le inferior \int_{a}^{b}f(x)dx\le M(b-a)$$

I don't know the point of this inequality.

Then, the book says the following:

Let $P_0$ be a partition of $[a,b]$. If we consider the sums $f(f, P)$ and $S(f,P)$ relative to the partitions $P$ that 'make $P_0$ thin' we have the same values for $superior \int_{a}^{b}f(x)dx$ and $inferior \int_{a}^{b}f(x)dx$

What this is exactly saying? If we choose a thinner set, we have the same values for the sup and inf integrals?

Finally, the book says that a function is integrable iff $superior \int_{a}^{b}f(x)dx$ and $inferior \int_{a}^{b}f(x)dx$ are equal. It seems resonable, because both integrals seems to approach the original integral definition I knew. They're both approaching the area in between the above and below areas.

The main question is: why do I need all this to define integrability? Why can't i say that a funciton is integrable if the limit of the riemman sums exist? I know there are other types of integrations but here we're only dealing with riemman sums. Also, could someone explain me why these definitions in between are needed? I imagine that the 'thinner' thing is kinda of a limit, that says no matter how partitioned is our partition, the integral would be the same. Is this it?


The basic reason why one considers lower and upper Riemann sums is that it is allowed to take the $\inf$ or $\sup$ over any however structured set of real values, while taking a limit in the ordinary calculus sense requires a strict scenario regarding the independent variable, as in $\lim_{n\to\infty} x_n$, or in $\lim_{x\to\alpha} f(x)$.

Note that even a single upper sum $S(f,P)$ already requires the computation of $n$ supremums, which are "limits" of some sort. But this is usually glossed over in a first treatment of the integral.

A general Riemann sum $\sum_{k=1}^n f(\tau_k)(t_k-t_{k-1})$ involves a lot of data, and in order to define the limit of such sums properly one has to define a "net" structure on the set of partitions, etc.

Here is a definition of integrability that does not involve any infs and sups, but only "finitary" Riemann sums:

A function $f:\>[a,b]\to{\mathbb R}$ is Riemann integrable over $[a,b]$ if for any given $\epsilon>0$ one can find a partition $P:\ a=t_0<t_1<\ldots<t_n=b$ of $[a,b]$ into subintervals $I_k:=[t_{k-1},t_k]$ and numbers $\Delta_k\geq0$ $\>(1\leq k\leq n)$ such that for all $k\in[n]$ one has the estimate $$\bigl|f(y)-f(x)\bigr|\leq\Delta_k\qquad(x, y\in I_k)\ ,$$ while $$\sum_{k=1}^n \Delta_k(t_k-t_{k-1})\leq\epsilon\ .$$ This definition expresses the intuition that over each $I_k$ the graph of $f$ can be covered with a tiny rectangle of area $\Delta_k(t_k-t_{k-1})$ such that the sum of the areas of these rectangles is $\leq\epsilon$. This allows, e.g., for jump discontinuities of $f$, but there shouldn't be too many of them.