What's the difference between a random variable and a measurable function?

I've tried to wrap my head around the measure theoretical definition of a random variable for a couple of days now.

In his book Probability and Stochastics, Erhan Çinlar defines a measurable function as follows:

Let (E, ℰ) and (F, Ƒ) be measurable spaces [where ℰ and Ƒ are σ-algebras on the sets E and F respectively]. A mapping f : E ↦ F is said to be measurable relative to ℰ and Ƒ if

f⁻¹B ∈ ℰ for every B in Ƒ.

Later, he defines a random variable as follows:

Let (Ω, H, ℙ) be a probability space. The set Ω is called the sample space; its elements are called outcomes. The σ-algebra H may be called the grand history; its elements are called events.

[...]

Let (F, Ƒ) be a measurable space. A mapping X : Ω ↦ F is called a random variable taking values in (F, Ƒ) provided that it be measurable relative to H and Ƒ, that is, if

X⁻¹A = {X ∈ A} := {ω ∈ Ω : X(ω) ∈ A} is an event [i.e. ∈ H] for every A in Ƒ

Aside from using (Ω, H) instead of (E, ℰ), these definitions look pretty identical to me. What's the difference? Are all measurable functions on probability spaces random variables? (And why is it called random if it's deterministic?)


Solution 1:

Formally, they are the same: a random variable is a particular type of measurable function. However, there are important differences which may be more philosophical than mathematical.

There is a (somewhat subjective) difference in the underlying domains. A random variable operates on a set of outcomes of a random experiment or process. A measurable function normally does not (otherwise it's called a random variable). This isn't a mathematical difference, per-se, as the underlying domains are just sets, and the $\sigma$-algebra and measures provide the relevant mathematical structure.

For a probability space though, even if a specific random process isn't mentioned, one might argue that there is an implied assumption that we are to imagine that such a random process 'exists'. Otherwise, there is no need for probabilists to have their own language $-$ probability vs. measure; almost surely or almost always vs. almost everywhere, etc.

A philosophical note: A random variable can only be evaluated by performing an unpredictable experiment, i.e. you can't truly choose which $\omega$ to plug in. If you did choose a specific $\omega$, then you aren't evaluating a random variable, you're just evaluating a measurable function. A regular measurable function can always be evaluated however you please. This is a subtle, more philosophical point, but mathematicians should be free to discuss such topics. It is an important part of developing a deep understanding, similar to the way that physicists discuss the existence of the objects they study.

The set $\Omega$ is oftentimes not explicitly described. When $\Omega$ is clearly defined, often a note is added to state that we are ignoring certain things in order to define $\Omega$. However, $\Omega$ is usually mapped to another set (via a random variable) $X:\Omega\rightarrow A$ thereby inducing a probability measure on $A$, which is a more tractable space.

An example: Let $X$ be the random variable representing the outcome of a fair coin flip; $\Omega$ can be thought of as the set of outcomes of that coin flip (actual real physical outcomes, not numbers or letters). Of course, we usually map $\Omega$ to $\{0,1\}$ to do calculations. The random variable $X$ acts on $\Omega$ though, not $\{0,1\}$. So the value of $X$ depends on the actual outcome of a real physical coin flip $-$ which you cannot predict perfectly, i.e. it's random. Once you are working in $\left(\{0,1\},2^{\{0,1\}},\nu\right)$, where $\nu$ is simply a normalized counting measure, we can think of $X$ as being the deterministic identity function: $X(i)=i$. But the probability space is $\left(\Omega,\mathcal{F},\mathbb{P}\right)$, on which $X$ is a random variable. Usually, $X(\omega)=\mathbf{1}_{\{\text{upper face on coin is heads}\}}(\omega)$. So $X:\left(\Omega,\mathcal{F},\mathbb{P}\right)\rightarrow\left(\{0,1\},2^{\{0,1\}},\mu\right)$ thus maps a probability space to a measure space, in a sense.

Even reducing $\Omega$ to $\{\omega_1,\omega_2\}$ is an oversimplification. With a coin flip, there could be all sorts of variables to include, such as the length and shape of the flight path of the coin, how many times it flipped in the air, the location and angle of rotation after it lands, etc. All of these could result in a large number of $\omega$'s.

But you could partition $\Omega=H\cup T\cup N$. Where $$H=\{\omega| \text{upper face of coin is heads}\},$$ $$T=\{\omega| \text{upper face of coin is tails}\},$$ and $$N=\{\omega| \text{upper face of coin is neither heads nor tails}\}.$$ For a fair coin, $\mathbb{P}(N)=0$ and $\mathbb{P}(H)=\mathbb{P}(T)=1/2$.

Often, $\Omega$ is completely ignored even. In the above example, instead of saying $P(\{\omega|X(\omega)=1\})=1/2$, we write $P(1)=1/2$.

Solution 2:

In the theory of probability as developed by Kolmogorov, random variables just are measurable functions (provided that the underlying space is a probability space, which is just a space with a normalized finite measure).

That's actually the key idea to Kolmogorov's theory. It allows us to rigorously encode everything that one did with elementary probability theory into measure spaces and measurable functions. In this formulation, things like expected value (and other moments) are just linear functionals which allows one to use all of the results of measure theory and functional analysis (such as inequalities, convergence results, etc.) as tools for probabilistic calculations.