Algebra of Random Variables?

I've been looking online (and in teaching journals) for a good introduction to Algebras of Random Variables (on an undergraduate level) and their usage, and have come up short. I know I can find the probability distribution of $h(z)$ where:

\begin{equation*} z = x + y. \end{equation*}

If $x$ and $y$ are from known independent probability distributions (the solution is simply a convolution). Two other operations $z=xy$ and $z=y/x$ can be solved for quite easily as well.

Does anyone know of any other, more complicated, uses for treating random variables as objects to be manipulated?


In the special case that the sample space is the non-negative integers (or a subset thereof), one can think of a probability distribution as a generating function $f(x) = \sum_{n \ge 0} a_n x^n$ where $a_n \ge 0$ and $f(1) = 1$. Then the sum of random variables corresponds to the product of generating functions, so one can bring generating function techniques (see, for example, Wilf) to bear on such random variables. For example, it is particularly easy to compute expected values this way: the expected value is $f'(1)$, and the product rule expresses the fact that expected value is additive. Similarly, the variance is $f''(1) + f'(1) - f'(1)^2$.

I don't really know a place where these issues are discussed in detail, but one spectacular family of examples is the computation of the expected values and variances of certain statistics on permutations. For example, suppose we want to compute the expected number of fixed points that a permutation of $n$ elements has. By Burnside's lemma, the answer is $1$. But another way to do this computation is to construct the family of polynomials

$\displaystyle P_n(x) = \frac{1}{n!} \sum_{\pi \in S_n} x^{c_1(\pi)}$

where $c_1(\pi)$ is the number of fixed points. Then the number we want is $P_n'(1)$. It turns out we can compute all these numbers at the same time because the bivariate generating function is

$\displaystyle P(x, y) = \sum_{n \ge 0} P_n(x) y^n = \frac{1}{1 - y} \exp \left( xy - y \right).$

Then the family of numbers we want is $\frac{\partial}{\partial x} P(x, y)$ evaluated at $x = 1$, which (as it is not hard to verify) is $\frac{1}{1 - y}$. In fact this is true for the second derivative and all higher derivatives as well; in particular, the variance of the number of fixed points is also $1$.

What if we want to know the expected number and variance of, say, the total number of cycles? Now we want to look at the family of polynomials

$\displaystyle Q_n(x) = \frac{1}{n!} \sum_{\pi \in S_n} x^{c(\pi)}$

where $c(\pi)$ is the total number of cycles. Then the number we want is $Q_n'(1)$. Now it turns out that the bivariate generating function is

$\displaystyle Q(x, y) = \sum_{n \ge 0} Q_n(x) y^n = \frac{1}{(1 - y)^x}$

(which should be interpreted as $\exp \left( x \log \frac{1}{1 - y} \right)$). The partial derivative $\frac{\partial}{\partial x} Q(x, y)$ evaluated at $x = 1$ is now

$\displaystyle \frac{1}{1 - y} \log \frac{1}{1 - y} = \sum_{n \ge 1} H_n y^n$

where $H_n$ is the $n^{th}$ harmonic number. Thus the expected number of cycles of a permutation of $n$ elements is about $\log n$. (It actually turns out that the expected number of cycles of length $r$ is $\frac{1}{r}$, from which this result immediately follows.) The second partial derivative evaluated at $x = 1$ is

$\displaystyle \frac{1}{1 - y} \log^2 \frac{1}{1 - y} = \sum_{n \ge 1} G_n y^n$

where $\displaystyle G_n = \sum_{k=1}^{n-1} \frac{1}{k} H_{n-k}$; I'm not sure of the asymptotic growth of this sequence, though, but whatever it is, the variance of the total number of cycles is $G_n + H_n - H_n^2$. (In any case $G_n \le H_n^2$, so the variance is less than or equal to $H_n$, and this is probably about right asymptotically.) One can deduce asymptotics for these kind of sequences using methods such as those in Flajolet and Sedgewick's Analytic Combinatorics, which is my best guess for more examples of using generating functions in this way. There are probably examples there related to statistics of trees.

All the generating function identities I used above are a consequence of the exponential formula, one version of which is proven and discussed in this blog post.


There are nice algebraic relationships for special families. The sum of normal (Cauchy, Levy) random variables is normal (Cauchy, Levy). The product of log-normal random variables is log-normal. The sum of gamma random variables is gamma if the distributions have a common scale parameter, etc. See more here.