Can we really compose random variables and probability density functions?

A renowned professor of statistics (whose name I will not reveal here) told me that the notation $p(x)$ makes perfect sense when $p$ is a pdf and $x$ is a RANDOM variable (i.e. a function). I was a bit surprised because I never thought that a pdf accepts functions as input, but, actually, $p(x)$ means a composition of the pdf with the r.v. $x$ (a function), i.e. composition of functions, i.e. it would be equivalent to $p \circ x = f(x)$.

This information revolutionized my view of statistics and revolutionized the way I look at expressions, like $p(x)$, in many formulas, where I thought that $p(x)$ was actually an output (a number) of the function $p$ (e.g. a pdf) when evaluated at the point $x$ of its domain, even though, in certain cases, it seemed like $p(x)$ needed to be a function (but I only thought that whoever had written that was just careless and wrote $p(x)$ instead of just $p$). Now, what those people had written, i.e. $p(x)$, probably made sense, because $p(x)$ is a function, and, actually, a random variable, because $x$ is a random variable.

So, formally, why does it really make sense to compose random variables and p.d.f.s? An r.v. $x$ is typically defined as $x \colon \Omega \to E$, where $\Omega$ is the sample space and $E$ is a measure space (e.g. $\mathbb{R}$ should be measurable). What are the domain and codomain of the pdf? The domain should be $E$, because, otherwise, why can we compose $p$ (the pdf) and $x$ (the random variable)?

Moreover, in many cases, we define what is apparently a pdf, and then we use it in places that require "probability distributions" or "random variables". For example, on page 13 of these notes, we define the multi-variate Gaussian pdf as follows

$$ p(x)=\frac{1}{(2 \pi)^{n / 2} \operatorname{det}(\Sigma)^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right) $$

I thought that the $x$ in the formula above was the dummy variable of the pdf Gaussian (at least, that's how I used to read that formula above), i.e. an element of its domain, but, then, after that definition, the author derives the analytic expression for the computation of the KL divergence using $x$ as a random variable, because, at some point, he will take the expectation of $x$ and, as far as I know, we can only take expectations of random variables (with respect to distributions), so $x$ must be a random variable there. So, is $x$, in the definition of the Gaussian pdf above, also a random variable, and does that mean that the pdf (denoted by $p(x)$) is also a random variable?


Solution 1:

If $X$ is a random variable defined on $(\Omega, \mathcal F,P)$ and $f: \mathbb R \to \mathbb R$ is any Borel measurable function then $Y=f(X)$ is defined as the random variable on $(\Omega, \mathcal F,P)$ such that $Y(\omega)=f(X(\omega))$. This is indeed a random variable (in the sense it is a real valued measurable function on $(\Omega, \mathcal F,P))$. In particular, any pdf is a is a Borel measurable function $ \mathbb R \to \mathbb R$, so $p(X)$ makes perfect sense.