What exactly is a random variable?

A random variable is nothing more or less than a function on a probability space with values in ${\mathbf R}$. First we need to be clear about what a probability space is, and then a random variable is a function from such a space to the real numbers. That is, what makes random variables special is the type of space on which they are defined.

A probability space is set $\Omega$ on which we can talk sensibly about the probability of selecting an element from a subset of $\Omega$.

Example 1. Let $\Omega$ be the outcomes of flipping a coin twice: $\Omega = \{(H,H),(H,T),(T,H),(T,T)\}$. If the coin is fair and the flips are done independently, then we assign each element of $\Omega$ the probability 1/4, and from this we can produce further probabilities, such as the probability of having at least one head (that is, selecting from $\{(H,H),(H,T),(T,H)\}$) being 1/4 + 1/4 + 1/4 = 3/4. If we had an unfair coin where $H$ comes up with probability 1/3 and $T$ comes up with probability 2/3, and the two flips are independent then $(H,H)$ has probability 1/9, $(H,T)$ and $(T,H)$ each have probability 2/9, and $(T,T)$ has probability 4/9. Now the probability of two (independent) flips of this coin having at least one $H$ is 1/9 + 2/9 + 2/9 = 5/9 < 3/4. If the two coin flips were not independent, then you could not determine the probability of a particular pair of coin flips from knowledge of the probability of $H$ or $T$ separately.

Example 2. Let $\Omega$ be the unit disc in the plane, centered at the origin. One probability distribution on $\Omega$ is the uniform distribution: the probability a randomly selected point from $\Omega$ lies in a subset $S$ is the area of $S$ divides by $\pi$ (the area of $\Omega$). Another probability distribution on $\Omega$ is the Dirac distribution at the origin: the probability that a randomly selected point from $\Omega$ lies in any subset $S$ of $\Omega$ is 1 if $(0,0) \in S$ and 0 otherwise. There are lots of probability distributions on the unit disc, and the particular choice you make determines what it means to pick a point from the disc randomly (according to the chosen probability distribution on the unit disc).

In practice, a probability space is the set of possible outcomes of an experiment, but mathematically a probability space is just a set on which we can speak about selecting elements from that space lying in different subsets.

A random variable on a probability space $\Omega$ is a function $\Omega \rightarrow {\mathbf R}$, and for historical reasons such functions are denoted by letters like $X$ or $Y$ (rather than $f$ or $g$). Because $X$ has real values, we can define the probability that $X$ has values in the interval $[2,5]$, say, to be the probability of selecting an element of $\Omega$ from $\{\omega \in \Omega : 2 \leq X(\omega) \leq 5\}$. The probability $X$ has a positive value is declared to be the probability of selecting an element of $\Omega$ from $\{\omega \in \Omega : X(\omega) > 0\}$. In other words, because $X$ is defined on a set $\Omega$ in which we can talk about the probability of picking elements from subsets of $\Omega$, we can transfer this probability mechanism over to intervals in ${\mathbf R}$ by looking at the subsets of $\Omega$ consisting of those elements where $X$ has values in a chosen real interval.

In many areas of math, you study spaces by studying real-valued functions on them, such as continuous functions on topological spaces or smooth functions on manifolds. One hopes that important properties of the space are reflected in the types of real-valued functions you can reasonably define on that space (continuous, smooth,...). In probability theory, one probes the "structure" of a probability space $\Omega$ by working with the real-valued functions on $\Omega$ in order to transfer the probabilities of subsets of $\Omega$ to probabilities of subsets of ${\mathbf R}$, because on ${\mathbf R}$ you can do things that may not be directly possible in $\Omega$ itself (e.g., add the values of two random variables on $\Omega$; it may not make sense to add elements of $\Omega$ directly, such as the case of outcomes of coin flips as $H$ or $T$).

I have assumed that you don't know measure theory, so I glossed over measure-theoretic technicalities above in order to convey the basic idea of what a random variable is. In the language of measure theory, a probability space is a measure space equipped with a measure giving the whole space measure 1 and a random variable on $\Omega$ is defined to be a real-valued measurable function on $\Omega$.


Given a universal sample space $\Omega$ such that

$$\Omega =\{\text {all possible outcomes (non-predetermined ) of an experiment }\}$$

Let note $n=\text{card}(\Omega) $, $\Omega= \{ \omega_i,\ 1\le i \le n\}$

Now let's define a function $X$ such that $$\begin{array}{ll}X:& \Omega& \rightarrow &\mathbb R \\& \omega_i&\mapsto &X(\omega_i)\end{array}$$ $X(\omega_i)$ is the value of the random variable $X$ at the event $\omega_i \in \Omega$.

Note that this function can be anything as it depends on the experiment, take for examples

  • The outcome of throwing a die. $\Omega=\{1,2,3,4,5,6\}$ and $X(\omega_i)=\omega_i \in \Omega$
  • Throwing two dice at a time, the sum of the results from both dice. $\Omega=\{1,2,3,4,5,6\} \times \{1,2,3,4,5,6\}$ and $X(\Omega)=\{2,3,4,\cdots,12\}$ with $X(\omega)=\alpha+\beta$ if $\omega=(\alpha,\beta)$
  • Throwing a coin where $\text{Head}=1 \text{ point}$ and $\text{Tail}=-1 \text{ point}$ $\Omega=\{\text{Head}, \text{ Tail}\}$ and $X(\Omega)=\{-1,1\}$ with $X(\text{Head})=+1$ and $X(\text{Tail})=-1$

These are all discrete random variables (probability mass function), one can also have continuous random variables where $\omega_i$ take values in $\mathbb R$,

Let $X$ be a random variable whose distribution function $F_X$ has a derivative. The function $f_X$ satisfying $$F_X(x)= \int^x_{-\infty} f_X(t)dt$$ is called the probability density function and X is called a continuous random variable.


A random variable is a function whose value is known only after some underlying uncertainty about the world is resolved. So for example, suppose there is some source of uncertainty in the physical world say temperature at noon tomorrow. We may have a good idea what the temperature is likely to be because of past history and weather models so we may have a probability distribution of what the temperature is likely to be at noon but we don't know for a fact if it is going to be some exact number up to some decimal places.

Now suppose you enter into a bet with a friend that says that for every degree the temperature is above 40 degrees he pays you 1 dollar and if the temperature is below 40 degrees then nothing happens. Your payoff from this bet depends on what temperature is measured at noon tomorrow. It can vary depending on what the temperature winds up being. The payoff from the bet is a function of the way the underlying uncertainty is resolved. The cashflow from this bet is a random variable. A random variable is just a function whose value we don't know right now and whose value can be different depending on how some underlying uncertainty is resolved but once the uncertainty is resolved we know the value of the function.