I'd like to have a correct general understanding of the importance of measure theory in probability theory. For now, it seems like mathematicians work with the notion of probability measure and prove theorems, because it automacially makes the theorem true, no matter if we work with discrete and continuous probability distribution.

So for example, expected value - we can prove the Law of large numbers using its general definition (measure theoretic) $\operatorname{E} [X] = \int_\Omega X \, \mathrm{d}P$, and then derive the formula for discrete and continuous cases (discrete and continuous random variables), without having to prove it separately for each case (we have one proof instead of two). One could say that the Law of large numbers justifies the definition of expected value, by the way.

Is it right to say that probability using the general notion of probability measure saves work of mathematicians? What are the other advantages?

Please correct me if I'm wrong, but I hope you get the idea of what sort of information I expect - it's the importance and role of measure theory in probability and answer to the question: are there theorems in probability that do not hold for general probability measure, but are true only for either discrete or continuous probability distribution? If we can prove that no such theorems can exist, we can simply forget about the distinction between discrete and continuous distributions.

If someone could come up with a clear, concise summary, I'd be grateful. I'm not an expert, so please take that into account.


Solution 1:

Since measure-theoretic axiomatization of probability was formulated by Kolmogorov, I think you'd be very much interested in this article. I had similar questions to you, and most of them were clarified after the reading - although I've also read Kolmogorov's original work after that.

One of the ideas is that historically there were proofs for LLN and CLT available without explicit use of measure theory, however both Borel and Kolmogorov started using measure-theoretical tools to solve probabilistic problems on $[0,1]$ and similar spaces, such as treating a binary expansion of $x\in [0,1]$ as coordinates of a random walk. Then the idea was: it works well, what if we try to use this method much more often, and even say that this is the way to go actually? When the work of Kolmogorov was first out, not every mathematician was agree with his claim (to say the least). But you are somewhat right in saying that measure theory allows dealing with probability easier. It's like solving basic geometric problems using vector algebra.

Regarding facts exclusively available for discrete/continuous distributions: usually a good probabilistic theorem is quite general and works fine with both cases. However, there are some things that hold for "continuous" measures only. A proper name for continuous is atomless: $\mu$ is atomless if for any measurable set $F$ there exists $E \subseteq F$ such that $\mu(E) < \mu(F)$ where inequality must be strict. Then the range of $\mu$ is convex, that is for all $0 \leq c \leq \mu(\Omega)$ there exists a set $C$ such that $\mu(C) = c$. Of course, that does not hold for measures with atoms. Not a very probabilistic fact though.

Solution 2:

2 reasons why measure theory is needed in probability:

  1. We need to work with random variables that are neither discrete nor continuous like $X$ below:

Let $(\Omega, \mathscr{F}, \mathbb{P})$ be a probability space and let $Z, B$ be random variables in $(\Omega, \mathscr{F}, \mathbb{P})$ s.t.

$Z$ ~ $N(\mu,\sigma^2)$, $B$ ~ Bin$(n,p)$.

Consider random variable $X = Z1_A + B1_{A^c}$ where $A \in \mathscr{F}$, discrete or continuous depending on A.

  1. We need to work with certain sets:

Consider $U$ ~ Unif$([0,1])$ s.t. $f_U(u) = 1_{[0,1]}$ on $([0,1], 2^{[0,1]}, \lambda)$.

In probability w/o measure theory:

If $(i_1, i_2) \subseteq [0,1]$, then $$P(U \in (i_1, i_2)) = \int_{i_1}^{i_2} 1 du = i_2 - i_1$$

In probability w/ measure theory:

$$P(U \in (i_1, i_2)) = \lambda((i_1, i_2)) = i_2 - i_1$$

So who needs measure theory right? Well, what about if we try to compute

$$P(U \in \mathbb{Q} \cup [0,1])?$$

We need measure theory to say $$P(U \in \mathbb{Q} \cup [0,1]) = \lambda(\mathbb{Q}) = 0$$

I think Riemann integration doesn't give an answer for $$\int_{\mathbb{Q} \cup [0,1]} 1 du$$.

Furthermore, $\exists A \subset {[0,1]}$ s.t. $P(U \in A)$ is undefined.


From Rosenthal's A First Look at Rigorous Probability Theory:

enter image description here


enter image description here


enter image description here


enter image description here

Solution 3:

There's an exciting and amazing theorem in Kai Lai Chung's A Course in Probability Theory about distribution functions (d.f.), which states :

enter image description here

or with this refinement,

enter image description here This super professional statement, overkills every old fashion probability theory about discrete, continuous or mixed distribution functions !

Theorem 1.3.2 can't be proven, unless by the powerful paradigm, Measure Theory.


Moreover, Measure Theory has much more tools to study Probability Theory. In fact,

  • Strong Law of Large Numbers can NOT be proven without measure Theory.

  • You can NOT define Brownian Motion precisely

  • You can NOT work with Stochastic Differential Equations without Measure Theory