Why not defining a measure as a function on functions?

Solution 1:

First of all, if one wants a "measure defined on a set $\mathcal{F}$ of functions" and if

1) $\mathcal{F}$ only contains functions measurable w.r.t. some $\sigma$-algebra $\mathcal{A}$,

2) $\mathcal{F}$ contains all indicator functions $1_A$ for $A \in \mathcal{A}$,

then it is just more economical to only require a measure to be defined on the sets $A \in \mathcal{A}$ and then use the usual measure-theoretic arguments/theorems to extend the integral.


But of course, there are cases where the two conditions above are not fulfilled, for example one starts from a functional $\mu : C(K) \to \Bbb{R}$, where (e.g.) $C(K)$ are the continuous functions on some compact Hausdorff space $K$. If $\mu(f) \geq 0$ in case of $f \geq 0$, then the Riesz representation theorem shows that there actually is a measure $\nu$ such that $$ \mu(f) = \int f \, d \nu \qquad \forall f \in C(K). $$ The same holds if the space $C(K)$ is replaced by the space $C_c (\Omega)$ of all compactly supported functions on a locally compact Hausdorff space $\Omega$.


There is even a more general construction, the Daniell integral, which is described well in the Wikipedia article and also in Section 4.5 of the (excellent) book "Real Analysis and Probability" by Richard M. Dudley (who calls it the Daniell-Stone integral).

I will follow here the path of Dudley. We start with a so-called vector lattice $F$ of functions $f : X \to \Bbb{R}$ for some base-set $X$. This means that $F$ is a vector space and that if $f,g \in F$, then also $\max\{f,g\} \in F$.

Then, suppose that we are given a linear functional $\mu : F \to \Bbb{R}$ such that

1) If $f \in F$ satisfies $f \geq 0$, then $\mu(f) \geq 0$,

2) If the sequence $(f_n)_n$ in $F$ converges pointwise nonincreasing to $0$, then $\mu(f_n) \to 0$.

Finally, if $F$ is a Stone vector lattice, i.e., if $\min\{f,1\} \in F$ for all $f \in F$, then there is a measure $\nu : \mathcal{A} \to [0,\infty]$, defined on some $\sigma$-algebra $\mathcal{A}$, such that all $f \in F$ are $\mathcal{A}$-measurable and $$ \mu(f) = \int_X f \, d\nu \qquad \forall f \in F. $$

Hence, in most reasonable cases, there is not much difference between the two approaches.

Furthermore, the measure $\nu$ is uniquely determined by the properties above on the smallest $\sigma$-ring for which all $f \in F$ are measurable.


Finally, I have seen the notation $\mu(f)$ some times, in particular (IIRC) in the book "Foundations of modern probability" by Olaf Kallenberg. Thus, it seems to be more or less common in the probabilistic community.