Precise definition of the support of a random variable

I am not entirely convinced with the line the sample space is also called the support of a random variable

That looks quite wrong to me.

What is even more confusing is, when we talk about support, do we mean that of $X$ or that of the distribution function $Pr$?

In rather informal terms, the "support" of a random variable $X$ is defined as the support (in the function sense) of the density function $f_X(x)$.

I say, in rather informal terms, because the density function is a quite intuitive and practical concept for dealing with probabilities, but no so much when speaking of probability in general and formal terms. For one thing, it's not a proper function for "discrete distributions" (again, a practical but loose concept).

In more formal/strict terms, the comment of Stefan fits the bill.

Do we interpret the support to be

- the set of outcomes in Ω which have a non-zero probability,
- the set of values that X can take with non-zero probability?

Neither, actually. Consider a random variable that has a uniform density in $[0,1]$, with $\Omega = \mathbb{R}$. Then the support is the full interval $[0,1]$ - which is a subset of $\Omega$. But, then, of course, say $x=1/2$ belongs to the support. But the probability that $X$ takes this value is zero.


TL;DR

The support of a r.v. $X$ can be defined as the smallest closed set $R_X \in \mathcal{B}$ such that its probability is 1, as Did pointed out in their comment. An alternative definition is the one given by Stefan Hansen in his comment: the set of points in $\mathbb{R}$ around which any ball (i.e. open interval in 1-D) with nonzero radius has a nonzero probability. (See the section "Support of a random variable" below for a proof of the equivalence of these definitions.)

Intuitively, if any neighbourhood around a point, no matter how small, has a nonzero probability, then that point is in the support, and vice-versa.



I'll start from the beginning to make sure we're using the same definitions.

Preliminary definitions

Probability space

$\newcommand{\A}{\mathcal{A}} \newcommand{\powset}[1]{\mathcal{P}(#1)} \newcommand{\R}{\mathbb{R}} \newcommand{\deq}{\stackrel{\scriptsize def}{=}} \newcommand{\N}{\mathbb{N}}$ Let $(\Omega, \A, \Pr)$ be a probability space, defined as follows:

  • $\Omega$ is the set of outcomes

  • $\A \subseteq \powset{\Omega} $ is the collection of events, a $\sigma$-algebra

  • $\Pr\colon\ \mathbf{\A}\to[0,1]$ is the mapping of events to their probabilities. It has to satisfy some properties:

    • $\Pr(\Omega) = 1$   (we know $\Omega \in \A$ since $\A$ is a $\sigma$-algebra of $\Omega$)
    • has to be countably additive

Random variable

A random variable $X$ is defined as a map $X\colon\; \Omega \to \R$ such that, for any $x\in\R$, the set $\{\omega \in \Omega \mid X(\omega) \le x\}$ is an element of $\A$, ergo, an element of $\Pr$'s domain to which a probability can be assigned.

We can think of $X$ as a "realisation" of $\Omega$, in that it assigns a real number to each outcome in $\Omega$. Intuitively, this condition means that we are assigning numbers to outcomes in an order such that the set of outcomes whose assigned number is less than a certain threshold (think of cutting the real number line at the threshold and forming the set of outcomes whose number falls on or to the left of that) is always one of the events in $\A$, meaning we can assign it a probability.

This is necessary in order to define the following concepts.

Cumulative Distribution Function of a r.v.

The probability distribution function (or cumulative distribution function) of a random variable $X$ is defined as the map $$ \begin{align} F_X \colon \quad \R \ &\to\ [0, 1] \\ x\ &\mapsto\ \Pr(X \le x) \deq \Pr(X^{-1}(I_x)) \end{align} $$

where $I_x \deq (-\infty, x]$. (NB: $X^{-1}$ denotes preimage, not inverse; $X$ might well be non-injective.)

For notational clarity, define the following:

  • $\Omega_{\le x} \deq X^{-1}((-\infty, x]) = X^{-1}(I_x)$
  • $\Omega_{> x} \deq X^{-1}((x, +\infty)) = X^{-1}(\overline{I_x}) = \overline{\Omega_{\le x}}$   where $\overline{\phantom{\Omega}}$ denotes set complement (in $\R$ or $\Omega$, depending on the context)
  • $\Omega_{< x} \deq X^{-1}((-\infty, x)) = \displaystyle\bigcup_{n\in\N} X^{-1} \left(I_{x-\frac{1}{n}}\right)$
  • $\Omega_{=x} \deq X^{-1}(x) = \Omega_{\le x} \setminus \Omega_{< x}$

we know all of these are still in $\A$ since $\A$ is a $\sigma$-algebra.

We can see that

  • $\Pr(X > x) \deq \Pr(\Omega_{>x}) = \Pr(\overline{\Omega_{\le x}}) = 1 - \Pr(\Omega_{\le x}) = 1 - F_X(x)$

  • $\Pr(X < x) \deq \Pr(\Omega_{<x}) = \Pr\left(\displaystyle\bigcup_{n\in\N} X^{-1} \left(I_{x-\frac{1}{n}}\right)\right)$ $= \lim\limits_{n \to \infty} \Pr(X \le x - \frac{1}{n}) = \lim\limits_{n \to \infty} F_X(x - \frac{1}{n}) = \lim\limits_{t \to x^-} F_X(t) \deq F_X(x^-)$

    since $X^{-1} \left(I_{x-\frac{1}{n}}\right) \subseteq X^{-1} \left(I_{x-\frac{1}{n+1}}\right)$ for all $n\in\N$.

  • $\Pr(X = x) \deq \Pr(\Omega_{=x}) = \Pr(\Omega_{\le x} \setminus \Omega_{<x})= \Pr(\Omega_{\le x}) - \Pr(\Omega_{<x}) = F_X(x) - F_X(x^-)$

and so forth.

Note that the limit that defines $F(x^-)$ always exists because $F_X$ is nondecreasing (since if $x< y$, then $\Omega_{\le x} \subseteq \Omega_{\le y}$ and $\Pr$ is $\sigma$-additive) and bounded above (by $1$), so the monotone convergence theorem guarantees that the images by $F_X$ by of any nondecreasing sequence approaching $x$ from the left will also converge, and thus the continuous limit $\lim_{t \to x^-} F_X(t)$ exists.


Probability measure on $\R$ by $X$

The mapping defined by $X$ is sufficient to uniquely define a probability measure on $\R$; that is, a map $$ \begin{align} P_X \colon \quad \mathcal{B} \subset \powset{\R} \ &\to \ [0, 1]\\ A \ &\mapsto \ \Pr(X \in A) \deq \Pr(X^{-1}(A)) \end{align} $$ that assigns to any set $A \in \mathcal{B}$ the probability of the corresponding event in $\A$.

Here $\mathcal{B}$ is the Borel $\sigma$-algebra in $\R$, which is, loosely speaking, the smallest $\sigma$-algebra containing all of the semi-intervals $(-\infty, x]$. The reason why $P_X$ is defined only on those sets is because in our definition we only required $X^{-1}(A) \in \A$ to be true for the semi-intervals of the form $A = (-\infty, x]$; thus $X^{-1}(A)$ is an element of $\A$ only when $A$ is "generated" by those semi-intervals, their complements, and countable unions/intersections thereof (according to the rules of a $\sigma$-algebra).




Support of a random variable

Formal definition

Formally, the support of $X$ can be defined as the smallest closed set $R_X \in \mathcal{B}$ such that $P_X(R_X) = 1$, as Did pointed out in their comment.

An alternative but equivalent definition is the one given by Stefan Hansen in his comment:

The support of a random variable $X$ with values in $\R^n$ is the set $\{x\in \R^n \mid P_X(B(x,r))>0, \text{ for all } r>0\}$ where $B(x,r)$ denotes the ball with center at $x$ and radius $r$. In particular, the support is a subset of $\R^n$.

The equivalence can be proven as follows:

Proof
Let $R_X$ be the smallest closed set $R_X \in \mathcal{B}$ such that $P_X(R_X) = 1$. That means that for every $x \in \R \setminus\overline{R_X}$, there exists a radius $r\in\R_{>0}$ such that the open interval (or open ball in the more general case) $(x-r, x+r)$ is contained within $\R \setminus R_X$ (since $R_X$ is closed).

That, in turn, implies that $P_X((x-r, x + r)) = 0$—otherwise, if this were strictly positive, $P_X(R_X \cup (x-r,x+r)) = P_X(R_X) + P_X((x-r, x+r)) > P_X(R_X) = 1$, a contradiction.

Conversely, suppose $P_X((x-r, x+r)) = 0$ for some $x\in\R$, $r\in\R_{>0}$. Then $(x-r, x+r) \subseteq \R \setminus R_X$ (and, in particular, $x \in \R \setminus R_X$). Otherwise $R_X' \deq R_X \setminus (x-r, x+r)$ would be a closed set smaller than $R_X$ satisfying $P_X(R_X') = 1$.

This proves $\R \setminus R_X = \{x\in\R \mid \exists r \in \R_{>0}\colon P_X((x-r, x+r)) = 0\}$

Negating the predicate, one gets $R_X = \{x\in\R \mid \forall r \in \R_{>0} P_X((x-r, x+r)) > 0\}$

But more often, different definitions are given.


Alternative definition for discrete random variables

A discrete random variable can be defined as a random variable $X$ such that $X(\Omega)$ is countable (either finite or countably infinite). Then, for a discrete random variable the support can be defined as

$$R_X \deq \{x\in\R \mid \Pr(X = x) > 0\}\,.$$

Note that $R_X \subseteq X(\Omega)$ and thus $R_X$ is countable. We can prove this by proving its contrapositive:

Suppose $x \in \R$ and $x \notin X(\Omega)$. We can distinguish two cases: either $x < y$ $\forall y \in R_X$, or $x > y$ $\forall y \in R_X$, or neither.

Suppose $x < y$ $\forall y \in R_X$. Then $\Pr(X = x) \le \Pr(X \le x) = \Pr(X^{-1}(I_x)) = \Pr(\emptyset) = 0$, since $\forall \omega\in\Omega\ X(\omega) > x$. Ergo, $x\notin R_X$.

The case in which $x > y$ $\forall y \in X(\Omega)$ is analogous.

Suppose now $\exists y_1, y_2 \in X(\Omega)$ such that $y_1 < x < y_2$. Let $S = \{y\in X(\Omega) \mid y < x\}$, which is. Thus $\sup L$ and exists, and $\lim_{y \to x^-} F_X(y) = F_X(\sup L)$ since $F_X$ is nondecreasing and bounded above. Thus, since $\sup L \le x$, $F_X(x) \ge F_X(\sup L)$ and therefore $\Pr(X=x) = F_X(x) - F_X(x^-) \ge F_X(\sup L) - F_X(x^-) = 0$.


Alternative definition for continuous random variables

Notice that for absolutely continuous random variables (that is, random variables whose distribution function is continuous on all of $\R$), $\Pr(X = x) = 0$ for all $x\in \R$—since, by definition of continuity, $F_X(x^-) = F_X(x)$. But that doesn't mean that the outcomes in $X^{-1}({x})$ are "impossible", informally speaking. Thus, in this case, the support is defined as

$$ R_X = \overline{\{x \in \R \mid f_X(x) > 0\}}\,,$$

which intuitively can be justified as being the set of points around which we can make an arbitrarily small interval on which the integral of the PDF is strictly positive.


The support of the density function $f_X(.)$ is the range of values of the random variable X for which the density function is positive. That is,

$\mathcal{R}_x:= \{{x\in \mathcal{R}_X : f_X(x) > 0\}}$

Note that $f_X(.)$ is the probability density/mass function of the random variable X.