Why do we want probabilities to be *countably* additive?

In probability theory, it is (as far as I am aware) universal to equate "probability" with a probabilistic measure in the sense of measure theory (possibly a particularly well behaved measure, but never mind). In particular, we do assume $\sigma$-additivity, but not anything more (say, additivity with respect to families with cardinality $\mathfrak{c}$ [which would of course make things break down]).

For me, as a mathematician, this is completely satisfactory, and until recently I hardly realised that it may not be entirely obvious that probability should behave thus. A sufficiently convincing justification for working with measures would be that integration theory is precious, and we want to be able to make use of integrals to compute expected values, variances, moments and so on. And we can't require any "stronger" kind of additivity, since then things fall apart already for a uniform random distribution on $[0,1]$.

However, recently I have had some interactions with non-mathematicians, who approach "higher" mathematics with some understandable uncertainty, but who still find the notion of probability relevant. One of the things it made me realise is that I am not myself fully aware why, in principle, we define things thus and not otherwise. Hence, after this overlong introduction, here is the question. Is there a fundamental reason why measure theory is the "only right way" to deal with probabilities (as opposed to e.g. declaring probabilities to be just finitely additive)? If so, is there a "spectacular" example showing why any other approach would not work? If not, then is there an alternative approach (with any research behind it)?


Solution 1:

If you are interested in finitely additive probability theory, you should consult the works of Bruno de Finetti. The book How to Gamble if You Must: Inequalities for Stochastic Processes by Lester Dubins and Leonard J. Savage is based on finite additivity. I will quote from what the authors say about this in section 2.3 of that classic. Not everyone will agree with their approach, but it is difficult to argue that theirs is not a respectable point of view.

A gamble is, of course, a probability measure $\gamma$ on subsets of fortunes. In the tradition of recent decades, such a measure would be defined only on a sigma-field of subsets of $F$ and required to be countably additive on that sigma-field.

If this tradition were followed in this book, tedious technical measurability difficulties would beset the theory from the outset. (To see this kind of potential difficulty, formulate mathematically the problem corresponding to that of a gambler with an initial fortune $f$ who desires to maximize the probability that his fortune at the end of two gambles will be in a certain subset of the unit interval, where for each $g$ in the interval there is a set $\Gamma(g)$ of countably additive gambles defined only on the Borel subsets of the interval.) Experience and reflection have led us to depart from tradition and to assume that each $\gamma$ is defined for all subsets of $F$ and is not necessarily countably additive. This departure is further explained and justified in the next paragraphs.

The assumption that $\gamma$ is not defined for all sets would, all in all, complicate this chapter; the restriction to countably additive gambles would weaken its conclusions. Some of the new problems that the finitely additive approach does introduce are of mathematical interest in themselves.

When a gamble is specified in practice--even in the most mathematical practice--the specification will often define the value of the gamble only on some subclass of the class of all sets, perhaps on a Boolean algebra. For example, it might be specified that a certain gamble coincides with Lebesgue measure for the Lebesgue-measurable subsets of the unit interval. It is therefore essential to handle problems in which the gambles are not defined for all subsets of fortunes. One way to do this, suggested by tradition, is to carry a concept of measurability and integrability throughout the discussion, exploring the integrability of various functions that arise as candidates for integration, and to discuss upper and lower (or outer and inner) integrals when nonintegrable functions do arise.

A seemingly equivalent and, we find, much simpler method of handling problems where gambles are defined only on a subclass of sets is to consider all extensions of each such incompletely defined gamble to the class of all sets of fortunes. According to the Hahn-Banach theorem, such extensions exist in abundance, though in a very nonconstructive sense. If, for example, that gambler starting from \$1,000 can reach \$10,000 with probability $.07$ in every completion of an originally incompletely defined problem, is it not a sensible interpretation to credit him with at least that much in connection with the problem as originally specified? Likewise, if there is something he cannot achieve (or approach) under any extension, it ought not be regarded as achievable in the original problem. Finally, if something can be approached for some extensions but not for others, then the original problem must be recognized as not sufficiently specified to yield a definite answer.
. . .
De Finetti (1930, 1937, 1949, 1950, 1955, 1955a) has always insisted that countable additivity is not an integral part of the probability concept but is rather in the nature of a regularity hypothesis; his papers (1949) and (1950) are more particularly concerned with the conceptual aspects of this question than are the others cited; (1955) and (1955a) are mathematical papers about finite additivity. Personal contact with de Finetti gave us the courage to break with the traditional restrictions of countable additivity and to view countably additive measures much as one views analytic functions--as a particularly important special case.
. . .
Besides those of primary interest for this book, there are other reasons for pursuing the study of finitely additive measures. To mention but one, sometimes the only natural measure is not countably additive. A natural and intuitive example that does not yet seem to be in the literature is this. There is one and only one translation-invariant probability measure defined on the Boolean algebra generated by the arithmetic sequences. Under this measure, the set $\{\dots,-2a,-a,0,a,2a,\dots\}$ is necessarily assigned a probability of $1/n$. An obvious relation between this measure and the more familiar notion of the (long-run) density of a subset of the integers is this: The upper and lower densities of a set are between the upper and lower measures. Nathan Fine has told us interesting number-theoretic facts, not yet published, that have flowed from his study of the completion of this measure. Another finitely additive measure, suggested by de Finetti, is the one that assigns to every interval of rational numbers the distance between its endpoints. Any probability measure that assigns probability $1$ to some countable set, but probability $0$ to every finite set, will be called diffuse.

Here are the de Finetti references:

de Finetti, Bruno, 1930. Sulla proprietà conglomerativa delle probabilità subordinate. Rendiconti dell'Istituto Lombardo 63 414-418.

de Finetti, Bruno, 1937. La prévision: ses lois logiques, ses sources subjectives. Annals de l'Institut Henri Poincaré 7 1-68.

de Finetti, Bruno, 1949. Sull impostazione assiomatica del calcolo delle probabilità. Annali Triestini 19 29-81.

de Finetti, Bruno, 1950. Aggiunta alla nota sull'assiomatica della probabilità. Annali Triestini 20 [Series 4, Volume 4, second section (science and engineering)] 5-22.

de Finetti, Bruno, 1955. La struttura delle distribuzione in un insieme astratto qualsiasi. Giornale dell'Istituto Italiano degli Attuari 28 16-28.

de Finetti, Bruno, 1955a. Sulla teoria astratta della misura e dell'integrazione. Annale di matematica pura ed applicata (Serie IV) 40 307-319.

Solution 2:

Terence Tao's free book on measure theory spends some time near the beginning developing "Jordan measure", which is a sort of finitely-additive version of Lebesgue measure.

As he points out, that theory is mostly fine as long as one happens to only work with things that are Jordan measurable. However, as Tao proves in Remark 1.2.8, there are even open sets on the real line that are not Jordan measurable. Similarly, it turns out that $[0,1]^2 \setminus \mathbb{Q}^2$ is not Jordan measurable (Exercise 1.1.8).

In general, I think Tao's presentation does show clearly the similarites and differences between Lebesgue and Jordan measure, although it takes some mathematical maturity to read it, so it might not help your friends.


Separately, one reason other than integration that countable additivity is important is that many sets of interest in probability theory are $G_\delta$ or $F_\sigma$, and we want such sets to be measurable.

For a very specific example, it should be the case that a random real number in $[0,1]$ has infinitely many $3$s in its decimal expansion. Formally, this means that the set $U$ of irrationals in $[0,1]$ that have only finitely many $3$s in their decimal expansion should have measure $0$. Now, for each $k$, the set of irrationals in $[0,1]$ with $k$ or more $3$s in their decimal expansion is open as a subset of the irrationals. So the set $U$ is $F_\sigma$ in $[0,1]\setminus \mathbb{Q}$, but it is not open or closed. So, if we did not have countable additivity of the measure, $U$ might not be measurable at all.

This phenomenon happens more generally when we use the Baire category theorem to construct some type of real; this theorem naturally constructs $G_\delta$ sets, not open or closed sets. The key benefit of countable additivity is that once open intervals are measurable, all Borel sets are measurable (and, moreover, all analytic sets - continuous images of Borel sets - are Lebesgue measurable). So, unless we really try, we are unlikely to construct nonmeasurable sets.