Typically in applied probabilistic or statical literature we work with random variables whose domain we don't specify. We just care about the set in which the random variables takes values.

For example, the number of aces in hand at a certain cardgame, the height of a population or the income of a company in a certain year are all random variables (the last two examples come from statistics). But in all of these examples, the domain is never given.

While we could always construct any number of artificial probability spaces, that would serve as domain, I'm interested in what a ̶"̶t̶r̶u̶e̶" compelling probability space domain could be that really models (the underlying the experiment of) these three examples?

EDIT To prevent unclarity with what I mean by "compelling". Let me be more precise by giving an example: Consider the random variable that counts the number of heads when flipping a coin $n$ times. Thus it takes values from $0,1,\ldots,n$. But which experiment would most likely be perform in order to lead to these values?
The most complelling space $\Omega$ would be $\Omega=\{ H,T\}^n$, the space of sequences of $n$ coin flipping, since this is what actually happens.
But one could just as well define this random variable on the set $\{0,1,…,n\}$, in which case the random variable would be the identity function. This space I would call artificial, not "compelling", because it doesn't give an accurate representation of the underlying experiment any more.
In particular I'm interested in the underlying space for the statistical examples.

P.S. See also this other question of mine, which also has a bounty running.


Solution 1:

Here is paper that discusses your question (see page 3). And this stack overflow question: what are the sample spaces when talking about continuous random variables

As you'll see, random variables are not required to use probability theory, they are just convenient ways to capture aspects of the underlying sample space we are interested in. We could choose to work directly with the underlying sample space if we knew it (as a running example I will use $\Omega = \{H,T\}^N$ for an N-coin-toss experiment).

Basically, the decision to model outcomes of an experiment as a random variable or treat them as direct observations of the sample space is mostly a matter of perspective. The random variables view separates the object itself (possibly an abstract object) $\omega \in \Omega$ from the questions we can ask about it (e.g., "HH" vs "Number of tails","Number of Heads", "At least one tail", "No more than 2 Heads" etc).

If you only care about one question, then the views are isomorphic. However, if you want to ask multiple questions about the same observational unit, then the random variables view is more consistent with what you are trying to do. For example, you ask the height and weight of 100 randomly chosen people -- in this case, a random variables view makes more sense, as "height" and "weights" are not independent objects in the real world that "just happen" to be correlated - they are linked through people ($\omega \in \Omega$).

So, let's say I gave you the underlying sample space $\Omega$ for a problem. Now what? You will want to start to ask questions about the probability of various events defined as measurable sets with elements from $\Omega$ (e.g., all outcomes where we toss at least three heads). There are two ways to do this:

  1. Create the set of all $\omega \in \Omega$ that have three heads and then calculate the probability of this set.
  2. Define an integer-valued random variable $X(\omega)$ that returns the number of heads in $\omega$. This will create a new sample space called the image of $X(\omega)$, along with an induced probability measure $P'$ that is defined over the integers 0 to N. This induced measure is called a pushforward measure (or image measure). Now you can re-cast your question as $P'(X=3)$ as opposed to $P(\{\omega \in \Omega: \#\text{Heads}(\omega) = 3\})$ using the original space.

You are probably familiar with this stuff -- however, you want to know why we bother with it. In the case of the analysis of a single random variable, we can very well re-define our sample space by using the induced sample space (or simply define a sample space to match the properties of the random variable).

This changes when we move to jointly distributed random variables. Without $\Omega$ (at least implicitly), we'd have no way to index joint observations. Here's an example:

Lets say you sample 5 values from each of two random variables, $X$ and $Y$:

  • Observed X's = $1,1,2,5,3$
  • Observed Y's = $0,1,1,0,1$

Now, you want to develop a joint distribution that describes these observations as random variables (i.e., different aspects of some common object). How will you do this? Most importantly, you need to first associate an observation from $X$ with an observation from $Y$. Implicit in this association is the assumption that there is some common sample space $\Omega_J$ that justifies us associating, say, the first observation of $X$ with the first observation of $Y$ to form the joint observation $(1,0)$ (in this example).

So, in my example, we are assuming there is some underlying event $\omega'\in \Omega_J$ such that $X(\omega')=1$ and $Y(\omega')=0$ and that there is a valid underlying probability space $(\Omega_J,\mathcal{F}_J,P_J)$ whose image will produce the observed joint distribution of $(X,Y)$.

However, we could dispense with all of this if we chose to model $X,Y$ not as random variables but as direct observations (the integers are our experimental units or foundation data).

At this point, you may still be unconvinced of the usefulness of the sample space view...

So, let's say you develop your distribution of $X,Y$ directly (no sample space[i.e., domain-less in your terminology]), then you want to add a new quantity $Z$. How do you do this. Without an underlying sample space you need to develop the joint distribution manually from first principles (i.e., ad hoc) whereas invoking the idea of an underlying sample space makes extending joint distributions a natural consequence of defining a new function over the same (usually implicit) underlying probability space. The fact that this can be assumed to be true is a major theoretical elegance of modern probability theory.

Again, it's a matter of perspective, but the random variables view, at least to me, has a philosophical/conceptual elegance to it when you consider joint observations and stochastic processes.

Here is a nice post in math.overflow that discusses something similar.

Solution 2:

If you are only interested in a specific collection of random variables, then I would argue that the most "natural" setting is to look at their joint law as you have done for the coin tosses. For example, if $(Z_n)$ is an iid sequence of $\mathcal N(0,1)$ random variables, we would have $\Omega=\mathbb R^{\mathbb N}$, $\mathcal F$ the (Borel) product $\sigma$-field, and $\mathbb P$ the probability measure such that $$\mathbb P\left\{\omega\in\mathbb R^{\mathbb N}\,:\,\omega(n)\in A_n\text{ for }n=1,\ldots,N\right\}=\prod_{n=1}^N\frac1{\sqrt{2\pi}}\int_{A_n}e^{\frac{-x_n^2}{2}}dx_n$$ for all Borel sets $A_1,\ldots,A_N$ (recall this uniquely defines a probability measure by Kolmogorov's theorem). In this case we have $Z_n(\omega):=\omega(n)$. Of course, having the random variables be independent is a trivial example, but the basic idea remains in far greater generality: suppose for each $i\in I$, we have a measurable space $(E_i,\mathcal A_i)$ and random variables $X_i$ defined on $E_i$ such that $$\mathbb P(X_i\in A_i\text{ for }i\in F)=:p\bigg(F,\prod_{i\in F}A_i\bigg)$$ is known for every finite $F\subset I$ and every collection of measurable sets $A_i\in\mathcal A_i$. Consider $\Omega:=\prod_{i\in I}E_i$, $\mathcal F$ the product $\sigma$-field of the $\mathcal A_i$'s, and the unique probability measure $\mu$ on $\mathcal F$ such that $$\mu\{\omega\in\Omega\,:\,\omega(i)\in A_i\text{ for }i\in F\}=p\bigg(F,\prod_{i\in F}A_i\bigg).$$ As far as the collection of random variables $(X_i)$ is concerned, $(\Omega,\mathcal F,\mu)$ knows as much as whatever our original probability space does, so we may as well assume our original space was $(\Omega,\mathcal F)$ with $\mathbb P=\mu$. Again, in this case we have $X_i(\omega)=\omega(i)$.

This set-up is not perfect, of course. For a simple example, take Brownian motion $(B_t)$. We would like to say $\mathbb P(t\mapsto B_t\text{ is continuous})=1$, but under the product $\sigma$-field this event is not even measurable. There are ways to work around such problems (in this specific case you use Kolmogorov's continuity theorem) but are usually handled on a case-by-case basis.

Another issue is when you are looking at a sequence of spaces. Consider for instance particles on the discrete $N$-torus performing symmetric simple exclusion. Explicitly, each particle independently performs a (continuous time) simple random walk on the the torus $\mathbb T_N:=\mathbb Z/N\mathbb Z$, but if a particle attempts to jump to a position which is already occupied, no jump occurs. It is interesting to consider the asymptotics of such a process, i.e. what happens as $N$ becomes large. But for distinct $N$, we necessarily require different probability spaces. How does it make sense to consider different $N$ simultaneously? What would be the probability of an event of the form $$\{\text{the system on the $N$-torus is at state $A$}\}\cap\{\text{the system on the $M$-torus is at state $B$}\}?$$

This is why we don't usually bother with the probability space too much. We know it exists - Kolmogorov's theorem guarantees that in many cases, and when there is delicate points like continuity, there are theorems that get around the problem. So we ignore it. Why do we care about the space? It's not that it's something so abstract we couldn't possibly begin to understand it, but we are almost without exception interested in some collection of random variables, and knowing everything we can about their joint law is enough.

EDIT: To address your questions below.

$1)$ The Kac-Rice formula provides a method for computing the (expected) number of zeroes of a smooth Gaussian field. The source I have linked deals exclusively with the case where the field depends only on finitely many iid $\mathcal N(0,1)$ random variables, in which case existence of the appropriate probability space is dealt with via our earlier example. However, the Kac-Rice formula still holds for a more general smooth Gaussian field $F:U\rightarrow\mathbb R$ for some open $U\subset\mathbb R^N$, only we now need to be careful in what conditions we place on the correlations; without a certain degree of correlation, we cannot hope to have a smooth field (e.g. if $\{F(x)\}_{x\in U}$ are iid then obviously $F$ is not smooth, or even continuous). Once we have appropriate conditions, the approach is similar to the Brownian motion case: first we construct a field in the standard (i.e. Kolmogorov consistency theorem) way, and then we show there is a smooth version.

$2)$ I don't believe there is a sensible answer to this question. This is common when we model particle systems using probability theory: we assume there are $N$ particles and construct our model, then we see what happens if $N$ is large (which seems sensible since if our system is macroscopic, we would expect something on the order of $10^{20}$ particles). We are not assuming that, for instance, we have some global space and then we keep adding more particles - for each $N$, the models are distinct. As you may imagine one needs to think about what it means for such a system to "converge" - typically, we will identify the state of the system with some empirical measure and then consider weak convergence of measures.

$3)$ One must be careful to remember that random variables and probability theory in general is a model for statistics, they are NOT the same thing. So strictly speaking, there isn't some abstract probability space underlying people's heights; height is deterministic, and it is simply a matter of who you choose to survey. Of course, it may be extremely useful to model such an experiment as drawing a sample of size $N$ from a particular distribution. I would argue that the most "natural" probability space for this model would be the joint law of an independent (infinite) sequence of random variables of the given distribution. So for example, if our distribution was the standard normal (which is obviously absurd for height, but you get the idea), then the natural space is the very first example I gave. As we saw there, this probability measure is more than capable with dealing with only a finite number of random variables, and has the advantage that it does not matter what your size $N$ is - you could keep surveying more people if you wanted. Again, this is getting into how you choose to model a specific experiment or problem, and it is important not to assert that there is a "true" probability space.

Solution 3:

The sample space is usually chosen as the simplest and easiest-to-understand representation of the relevant aspects of the system.

For the experiment of flipping $n$ coins, the sample space $\{H,T\}^n$ is good because all of its $2^n$ outcomes are equally likely. One could alternatively use the sample space $\{0, 1, ..., n\}$, but then one needs to assign non-equal probability masses, and those would be computed with the equally-likely model for $\{H,T\}^n$ in mind.

In standard probability courses, a lot of probability is done even before defining random variables. Abstract sample spaces are used and axioms are defined. This shows how a variety of different situations can be treated and emphasizes important concepts of outcome and event.

The topic of random variables occurs only later in standard courses. Random variables are good ways to represent events of interest. It is encouraging to know that random variables have a direct connection with sample spaces and probability axioms (so that the same probability theory that was learned before still applies). It is also useful to recognize that many different probability experiments can be modeled by the same kinds of random variables, i.e., variables with the same cumulative distribution functions (CDFs). In some cases it is easier to work directly with the CDFs (or joint CDFs for random vectors), rather than describing a sample space. This is a way of representing the problem very simply, where the random variables or vectors are just identity functions on $\mathbb{R}$ or $\mathbb{R}^n$.