Lebesgue integral basics
I'm having trouble finding a good explanation of the Lebesgue integral. As per the definition, it is the expectation of a random variable. Then how does it model the area under the curve? Let's take for example a function $f(x) = x^2$. How do we find the integral of $f(x)$ under $[0,1]$ using the Lebesgue integral?
As has been noted, the usual definition of the Lebesgue integral has little to do with probability or random variables (though the notions of measure theory and the integral can then be applied to the setting of probability, where under suitable interpretations it will turn out that the (Lebesgue) integral of (a certain) functions corresponds to the expectation of (a certain) random variable).
But this is not the origin of the Lebesgue integral. Here is an intuitive idea of what the Lebesgue integral is, as compared to the Riemann integral.
Recall from Calculus the idea behind the Riemann integral: the integral $\int_a^b f(x)\,dx$ is meant to represent the net signed area between the $x$-axis, the graph of $y=f(x)$, and the lines $x=a$ and $x=b$. The way we attempt to do this is by breaking up the domain, $[a,b]$, into subintervals $[a=x_0,x_1]$, $[x_1,x_2],\ldots,[x_{n-1},x_n=b]$. Then, on each subinterval $[x_i,x_{i+1}]$ we pick a point $x_i^*$, and we estimate the area under the graph of the function with the rectangle of height $f(x_i^*)$ and base $[x_i,x_{i+1}]$. This leads to the Riemann sums $$ \sum_{i=0}^{n-1} f(x_i^*)(x_{i+1}-x_i)$$ as estimates of the area under the graph. We then consider finer and finer partitions of $[a,b]$ and take limits to estimate the area.
Lebesgue's idea was that instead of partitioning the domain, we will partition the range; if the function takes values between $c$ and $d$, we can divide the range $[c,d]$ into subintervals $[c=y_0,y_1]$, $[y_1,y_2],\ldots,[y_{m-1},y_m=d]$. Then, we let $E_i$ be the set of all points in $[a,b]$ whose value under $f$ lies between $y_i$ and $y_{i+1}$. That is, $$ E_i = f^{-1}([y_i,y_{i+1}]) = \{ x\in[a,b]\,|\, y_i \leq f(x) \leq y_{i+1}\}.$$
If we have a way of assigning a "size" to $E_i$, call it its "measure" $\mu(E_i)$, then the portion of the graph of $y=f(x)$ that lies between the horizontal lines $y=y_i$ and $y=y_{i+1}$ will be $A$, where, $$ y_i\mu(E_i) \leq A \leq y_{i+1}\mu(E_i).$$ So Lebesgue suggests to approximate the the area by picking a number $y_i^*$ between $y_i$ and $y_{i+1}$, and considering the sums $$ \sum_{i=0}^{n-1} \mu(E_i)y_i^*.$$ Then consider finer and finer partitions of $[c,d]$, and this gives finer and finer approximations of of the area by these sums. The Lebesgue integral will be the limit of these sums. (The analogy given by Mike Spivey is very apt for the distinction between partitioning the domain and partitioning the range to find the sum.)
But in order for this to make sense, we need to develop a way of measuring fairly intricate subsets of the line, so that we can compute $\mu(E_i)$. So we first develop a way of doing this; turns out that if you accept the Axiom of Choice, then it is impossible to come up with a way of measuring that will (i) assign to an interval $[a,b]$ the "measure" $b-a$; (ii) will be invariant under translation, so so that if $F=E+c = \{e+c | e\in E\}$ then $\mu(F)=\mu(E)$; (iii) will be countably additive: if $E = \cup_{i=1}^{\infty}E_i$ and the $E_i$ are pairwise disjoint, then $\mu(E) = \sum\mu(E_i)$; and (iv) every subset of the line will have a well-defined (possibly infinite) measure. (If you don't accept the Axiom of Choice, then there are models of the reals where we can achieve this). So one drops the restriction (iv), and constructs a measure for which some sets will be "too weird" to have a measure. We then restrict attention to certain kinds of functions (called the measurable functions), which are the ones for which the sets we get in the process described above are all measurable sets. And then we define the Lebesgue integral for those functions, following the idea described above (but one does not define it exactly that way; instead the usual way is to describe $f$ as a limit of functions for which the integral is easy, and then compute the integral of $f$ as a limit of the integrals that are easy).
For your function, $f(x)=x^2$, this is fairly easy: the value all lie between $0$ and $1$, so say that we break up the range into subintervals of length $1/n$, so $y_i = i/n$, $i=0,\ldots,n$. Then $$f^{-1}([y_i,y_{i+1}]) = f^{-1}([i/n, (i+1)/n]) = [\sqrt{i/n},\sqrt{(i+1)/n}],$$ so the $n$th estimate, picking $y_i^* = y_i = i/n$ is just $$\sum_{i=0}^n (i/n)\left(\sqrt{(i+1)/n} - \sqrt{i/n}\right).$$ Take the limit as $n\to\infty$, and you will get that the limit is $\frac{1}{3}$, as expected. (I will spare you the details; see the end of this answer for a high-power way of getting the answer similar to the way you do it with the Riemann integral).
It turns out that not every function is Lebesgue-integrable, just like not every function is Riemann-integrable. But every function that is Riemann-integrable will also be Lebesgue integrable, and the value of its Lebesgue integral will be the same as the value of its Riemann integral. But there are functions that are not Riemann-integrable but are Lebesgue-integrable (for example, the characteristic function of the rationals is Lebesgue-integrable, with integral $0$ over any interval, but is not Riemann-integrable). We also have a "Fundamental Theorem of Calculus" for the Lebesgue Integral:
Theorem. If $F$ is a differentiable function, and the derivative $F'$ is bounded on the interval $[a,b]$, then $F'$ is Lebesgue integrable on $[a,b]$ and $$\int_a^x F'\,d\mu = F(x) - F(a).$$
Here, the integral is the Lebesgue integral.
In particular, to finally answer the question you ask about your example, since $F(x)=\frac{x^3}{3}$ is a differentiable function whose derivative is bounded over any finite interval, in particular over $[0,1]$, then from this theorem you can deduce that the integral over the interval $[0,1]$ of the derivative $F'(x)=x^2$ is equal to $F(1)-F(0)$; that is, $$\int_0^1 x^2\,d\mu = \int_0^1 \left(\frac{x^3}{3}\right)'\,d\mu = \frac{1}{3} - \frac{0}{3} =\frac{1}{3}.$$
I recommend the book A Garden of Integrals by Frank E. Burk (Dolciani Mathematical Expositions 31, MAA, 2007, ISBN 9-780883-853375); it discusses and compares the Cauchy integral, the Riemann integral, the Riemann-Stieltjes integral, the Lebesgue integral, the Lebesgue-Stieltjes integral, and the Henstock-Kurzweil integral; it also discusses the Wiener and Feynman integral. I just finished reading it recently.
One of my graduate school professors, Erhan Cinlar, used to give the following analogy to explain the intuitive difference between the Lebesgue integral and the Riemann integral.
Suppose you have a pile of coins of different denominations, and you want to know how much money you have. The Riemann integral is like picking up the coins, one-by-one, and adding the denomination of each to a running total. The Lebesgue integral is like sorting the coins by denomination first, and then getting the total by multiplying each denomination by how many you have of that denomination and then adding up those numbers. The methods are different, but you obtain the same result by either method.
In the same way, when both the Riemann integral and the Lebesgue integral are defined, they give the same value. As others have said, though, there are functions for which the Lebesgue integral is defined but the Riemann integral is not, and so in that sense the Lebesgue integral is more general than the Riemann.
The Lebesgue integral is a generalization of the usual Riemann integral taught in basic calculus. If the Riemann integral of a function over a set exists then it equals the Lebesgue integral. So the Lebesgue integral of $x^2$ over $[0,1]$ is just the old $(1/3) 1^3-(1/3)0^3$
The Lebesgue integral has the benefit of being defined for many more functions than the Riemann integral. Even more importantly the Lebesgue integral has useful limit properties:
- http://en.wikipedia.org/wiki/Dominated_convergence_theorem
- http://en.wikipedia.org/wiki/Monotone_convergence_theorem
The expectation of a random variable is a particular application of the Lebesgue integral where the function to be integrated is the random variable (viewed as a function on the sample space) and the integration is with respect to a probability measure.
You need to look at one of the many probability and measure books for the details. My own favourites are:
- Pollard, A User's Guide to Measure-Theoretic Probability
- Dudley, Real Analysis and Probability
Terence Tao has some online lecture notes:
- http://terrytao.wordpress.com/2010/09/09/245a-notes-1-lebesgue-measure/
- http://terrytao.wordpress.com/2010/09/19/245a-notes-2-the-lebesgue-integral/
- http://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/
The Riemann integral is pretty good and very intuitive, however the main reason to consider other types of integrals is that "the space of functions that are Riemann integrable", say $R(I)$ where $I\subset\mathbb{R}^n$ is compact, is too small (even though it is a linear space in the sense you can add them and multiply by constants).
If you just look at a piecewise continuous function that vanishes outside a bounded region and then you can go on with the Riemann integral. In mathematical analysis we look at various kinds of limits of functions and we would like the limit functions to stay in "the space" (we want the space to be complete).
About the best we can do in the Riemann case is to look at uniformly convergent sequences $f_n$ on a compact interval $I\subset\mathbb{R}^n$ - in that case the limit $\lim f_n\in R(I)$ and $\lim\int f_n =\int \lim f_n$. However, uniform convergent is very rare! (Many Fourier series are not continuous even though there partial sums are, etc..).
The Lebesgue integral can be constructed in several ways (ending up with the same space though). A first try might be to start with norming $R(I)$, $\|f\|=\int|f|$ and then we would get a distance between $f,g\in R(I)$ by $\|f-g\|$, thus turning up with a metric space which we may complete by adding all possible limits - this will not work however because even though $R(I)$ is small it is to large (there are unbounded functions such that $\|f\|=\infty$). A better start would be to look at $C(I)$ = the space of continuous functions on (the compact set) $I$, (certainly each $f\in R(I)$ is a point-wise limit of $C(I)$ functions) if we norm $C(I)$ in the same manner we would indeed get a normed space and the completion of that space is $L^1(I)$.
In $L^1$ you can sure take limits in norm and moreover, as has already been pointed out in other answers, you have many other better limit theorems such as Lebesgue dominated theorem or the monotone convergence theorem. Also, bounded functions of $R(I)$ do belong to $L^1(I)$.
In addition to the above: In order to the suggested norm to be a norm we need to consider two functions, $f$ and $g$, as equal whenever $\int|f-g|dx=0$ which, for example, happens when their value differ at some point of $I$.