What does it mean to sample, in measure theoretic terms?

The simple answer is that in measure theory, we will talk about distributions of draws and not work with the individual draws themselves. There are ways to define uncountably many draws from a continuous distribution sensibly, but they are highly nontrivial.

Your favorite programming language avoids the problem by approximation.

Btw: It all depends on the measure, nothing prevents me from defining a probability measure on an uncountable domian that picks a certain element with probability $1$.


You don't even have to go to uncountable domains for this to occur. Consider a countably infinite sample space such as the rationals in interval $\left[0,1\right]$. If all outcomes are equally likely, the probability of any single rational is $0$.

But that's reasonable because it doesn't make sense to measure individual sample points when the possible outcomes are infinite and equally likely. This is really what you allude to in your last paragraph.

I think the paradox exists because this type of probability model doesn't really represent real world situations (either the countable or uncountable case) where an actual sample is taken. In a theoretically unbounded, countable sample space, such as $\mathbb{Z}^+$, in actuality there is normally either some upper bound on the possible outcomes or else the probabilities decrease as $\infty$ is approached. In either case, the lower-valued (more realistic) outcomes actually have a tiny but positive probability.

For the bounded case, say the reals in interval $\left[0,1\right]$, again any real world situation does not really match up. There is always some limit on the accuracy of measurement (number of decimal places, if you like) of each outcome. So you are really dealing, in terms of measurable sets, with intervals in $\left[0,1\right]$, however tiny they might be. For this not to be true, it implies that a number with $5$ billion decimal places is as equally likely as $0$ or $0.1$ or $0.345$, which would be difficult to justify.

That's why it makes sense to work with $\sigma$-fields of subsets of $\Omega$. The need for $\sigma$-fields is more obvious with infinite sample spaces than with finite ones.

I don't know if that helps at all.