How to formalize "conditional random variables"

I've been using "conditional random variables" as a notation aid with some good success in problem solving. But I've heard people claim that one shouldn't define conditional random variables.

By a conditional random variable for $X$ given $Y$, a "pseudo" random variable $(X|Y)$ with the density function $f_{X|Y = y}(x) = \frac{f_{(X,Y)}(x,y)}{f_Y(Y)}$.

Does this path lead to ambiguity or contradiction? It seems pretty straight-forward to interpret $(X|Y)$ as a function from the sample space of $Y$ to the random variable $X$, so that $X$ is a random random variable. But is this abuse of notation sound?

More generally, what kinds of functions can be composed to make random variables while remaining consistent with "the" axioms of probability (i.e., some sensible foundation)?

Perhaps tangentially, is there a categorical interpretation? In particular, it would be nice if $(X|Y)$ and $Y$ are an adjoint pair.


This question has got some attention recently, so I thought I'd try to clarify my question again:

I guess my question is "how can we define choosing a random variable randomly?" After all, we can pick a random matrix, random people, random heights, etc. So why not arbitrary real functions?

Presumably, this would require a probability distribution to assign densities to real functions. This may not even be possible in the "general" case, and this might be a reason why the construction I'm trying to get at is unsound.

But it certainly seems that we can define conditional random variables for "classes" of random variables, for example by treating a parameter of a probability distribution as a random variable.

Conditional expectation seems to be another instance of the idea.

So there seems to be a tension between these instances and the "fact" that it can't be done in general. I am hoping someone can talk to us about it. :-)


But is this abuse of notation sound?

As others have noted in the comments, the answer is not quite. But to inform your understanding of why not, it may be helpful for you to read about the concept of conditional expectation, which may be the closest formal approximation of what you're trying to get at.

The setup for the definition requires you to brush of on your measure theoretic probability, and consists of:

  • A probability space $(\Omega, \mathcal{F}, P)$.
  • A random variable $X : \Omega \to \mathbb{R}^n$.
  • Another random variable $Y : \Omega \to U$ (where $(U, \Sigma)$ is some other measure space).

The conditional expecation $\mathbb{E}( X \mid Y )$ is, in a precise sense, the $L_2$-closest $Y^{-1}(\Sigma)$-measurable approximation of $X$. That is, it answers the question What is the most that we can know about $X$ given information that we can glean from observing $Y$?

More formally, letting $\mathcal{H} = Y^{-1}(\Sigma)$, $\mathbb{E}(X \mid Y)$ is an $\mathcal{H}$-measurable random variable (i.e. it is as "coarse" as $Y$) which is is guaranteed to agree with $X$ on any event $H \in \mathcal{H}$:

$$ \int_H \mathbb{E}(X \mid Y) \, dP = \int_H X \, dP. $$ Its existence is proved via the Radon-Nikodym theorem.

And furthermore,

Perhaps tangentially, is there a categorical interpretation?

while I don't have a strong grasp of category theory and so I won't try to explain it it categorical terms, conditional expectation does have a nice interpretation in terms of factorization / commutative diagrams, as can be seen on the wikipedia page :)