Proof of the Box-Muller method

This is Exercise 2.2.2 from Achim Klenke: »Probability Theory — A Comprehensive Course«.

Exercise (Box–Muller method): Let $U$ and $V$ be independent random variables that are uniformly distributed on $[0,1]$. Define $$X := \sqrt{−2\log(U)}\, \cos(2\pi V) \quad \text{and} \quad Y := \sqrt{−2\log(U)}\, \sin(2\pi V)\, .$$

Show that $X$ and $Y$ are independent and $\mathcal{N}_{0,1}$-distributed.

Solution: Define random variable $R:= \sqrt{-2\log(U)}$, then \begin{align*} \mathbf{P}\bigl[R \leq r\bigr] & = \mathbf{P}\bigl[-2 \log(U) \leq r^2\bigr] = \\ & = \mathbf{P}\bigl[\log(U) \geq -\frac{r^2}{2}\bigr] = \\ & = 1 - \mathbf{P}\biggl[U < \exp\Bigl(-\frac{r^2}{2}\Bigr)\biggr]\, . \end{align*} $U$ is uniformly defined on $[0, 1]$, so the distribution of $R$ is $$\mathbf{P}[R\leq r] = 1 - \int_0^{\exp(-r^2/2)} \, dt = 1 - \exp\Bigl(-\frac{r^2}{2}\Bigr)\, .$$ For the density of $R$ we get: $f_R(t) = \exp\Bigl(-\frac{t^2}{2}\Bigr)\cdot t$ with $t> 0$.

We also define the random variable $\Phi := 2\pi V$. Since $V$ is uniformly distributed on $[0, 1]$, $f_\Phi(t) = \frac{1}{2\pi}$ with $0< t \leq 2\pi$.

Since $U, V$ are independent, $R, \Phi$ must also be independent and $$f_{R, \Phi}(t_1, t_2) = f_R(t_1) f_\Phi(t_2) = \frac{1}{2 \pi} \exp\Bigl(-\frac{t_1^2}{2}\Bigr)\cdot t_1 \, .$$

With \begin{align*} g\colon (0,\infty)\times(0, 2\pi] &\rightarrow \mathbb{R}^2 \\ (r, \phi) &\mapsto \bigl(r \cos(\phi), r \sin(\phi)\bigr) \end{align*} we see that $$(X, Y) = g(R, \Phi)\, ,$$ so we want to find the image measure $$\mathbf{P}_{X, Y} = \mathbf{P}_{R, \Phi}\circ g^{-1}\, .$$

We use the transformation formula for densities: $$ f_{X, Y}(\tau_1, \tau_2) = \frac{f_{R, \Phi}(g^{-1}(\tau_1, \tau_2))}{|\det(g'(g^{-1}(\tau_1, \tau_2)))|}$$

$g$ is just the transformation for polar coordinates. With $$ t_1 = \sqrt{\tau_1^2 + \tau_2^2} = |\det(g'(g^{-1}(\tau_1, \tau_2)))|$$ we finally get $$f_{X, Y}(\tau_1, \tau_2) = \frac{1}{2 \pi} \exp\Bigl(-\frac{\tau_1^2 + \tau_2^2}{2}\Bigr) = \underbrace{\frac{1}{\sqrt{2 \pi}} \exp\Bigl(-\frac{\tau_1^2}{2}\Bigr)}_{=f_X(\tau_1)} \cdot \underbrace{\frac{1}{\sqrt{2 \pi}} \exp\Bigl(-\frac{\tau_2^2}{2}\Bigr)}_{=f_Y(\tau_2)}\, ,$$ that is: $X, Y$ are $\mathcal{N}_{0, 1}$-distributed and independent. $\square$


Could you please check my proof? I'm sorry that it's so long — it seems right to me, but I'm self-studying and really need to catch any eventual mistakes... Thank you!

I am physicist and I can prove it faster (as a physicist ) intuitive derivation by using https://en.wikipedia.org/wiki/Inverse_transform_sampling

Imagine $2D$ Gaussian copula in polar coordinates $(\phi,r)$ representing $2D$ probability density in plane. (The copula is symmetric around the $Z-$axis.) $\rho(r)=A\exp(-\frac12r^2))$ where $A$ is a scaling parameter, such that $$\int_ 0^\infty 2\pi r \rho(r) dr = 1.$$(you can easily calculate the integral ; $2\pi r dr$ is area between two concentric close circles in $x,y$ plane) You get it from linear Gauss density function in Cartesian coordinates: $\rho(r)=\rho_1(x) \rho_1(y)$ where $\rho_1(x)=\sqrt{A}\exp(-\frac12x^2))$ (using Pythagoras theorem or euclidean metric as well)

Let's generate random $3D$ points uniformly under the the Gauss copula. Projecting the uniformly distributed samples in volume $(X,Y,Z)$ or $(r,\phi,z)$ into the base plane $(X,Y)$ or $(r,\phi)$ provides the Gaussian distributed samples in the $2D$ plane. Projecting those samples into any direction (for example $X$ or $Y$ axis) we get the chosen Gaussian distribution.

Think of the following step or mathematical construction. Split the whole volume of unit 1 under the Gaussian copula into small objects of the same volume of $dV=dr .(r*d\phi) dh$ and attach an index label to them from $1$ into $N$, where $N$ is around $\frac{1}{dV}$, $dr$ is radial size,$r*d\phi$ is tangential size (perpendicular to $dr$) and $dh$ is height in $Z$ axis.Note that for the constant volume the $r*d\phi$ has to be constant, the further from origin the larger r and the smaller $d\phi$ gets to keep this tangential dimension constant. Now you can generate uniformly integer numbers in range $i=1 \ldots N$ (or real numbers from interval $<0,1>$ multiplied by $N$ to pick the uniformly distributed index representing the uniformly distributed location of infinitesimal volume $dV_i$. This is a random uniform volume sampling. For the purists to get the true random points sampling you can combine it with random triplet $(r_1,\rho_1,h1)$ to pick the random point within the random infinitesimal volume $dV_i$. We demonstrated that uniform volume sampling leads to the Gaussian distribution of the projected samples. We choose any indexing system of infinitesimal volumes $dV$ with the only constraint: Let $dV_i$ has $r_i$ coordinate and $dV_j$ has $r_j$ coordinate. If $r_i<r_j$ then $i<j$. Indexing is growing with $r$. We leave the indexing within the wall of cylinder (volumes with the same R) unspecified on purpose.

We will generate uniformly a real number $P$ from $(0,1)$ and when multiplied by $N$ that will pick a volume $dV_i$ somewhere inside a cylinder wall (of $dr$ thickness) with radius $R$ and height $\rho(R)$ . Probability or fraction $P$ of uniformly distributed random points under copula with $r$ closer than $R$ (inside the cylinder) is integral from $0$ to $R$ of the height $\rho(r)$ of copula at distance r from origin times thickness dr times circumference of cylinder ($2\pi r$). It is integral of this value

$P(r<R)=\int_0^R 2 \pi r \rho(r) dr$.

We can get easily (this is the reason why we work in polar coordinates) a closed formula of this integral by simple substitution $\frac12 r^2=p$ and $rdr=dp$

\begin{align} P&=2\pi A \int_0^{\frac12 R^2} \exp(-p) dp \\ &=2\pi A \left[-\exp(-p)\right]_0^{\frac12 R^2} \\ &=2\pi A [1-\exp(-\frac12 R^2)] \end{align}

Notice that when $R$ goes to $\infty$ we normalize the probability the integral goes to $1$ so we have $A=1/(2 \pi)$ ).
We solve for $R$:

$$R(P)=\sqrt{-2\ln(1-P)} $$

That is:An uniformly generated $P$ produces a properly distributed $R$ in polar coordinates. This is a general principle of inverse transform sampling.https://en.wikipedia.org/wiki/Inverse_transform_sampling You generate uniform samples in generalized volume or area under a distribution function and get location (projection) of sample. We know we should get uniformly distributed $\phi$ coordinate as well. Index $i$ generated above under fully defined specific indexing system (within cylindrical wall) should provide this uniformly distributed $\phi$ (due to symmetry) as well. However it is easier to neglect the details of indexing system within the cylinder wall and take advantage of symmetry of cupola around Z-axis. Due to rotational symmetry around Z-axis we can simply generate uniformly distributed $\phi \in (0,2\pi)$ . The pair

(R,$\phi$)=(R(P),2$\pi$*U)

where P and U are independent uniformly random distributed real values on interval (0,1) define sample points of 2D Gaussian distribution in polar coordinates. At the end we can do simple projection/transformation of those samples into Cartesian coordinates. We get the celebrated Gaussian distribution of random samples in X and Y axis independently.

$$x=R\cos(\phi) $$ $$y=R\sin(\phi)$$

In formula for $R(P)$ above notice that if $P$ is uniformly distributed on $(0,1)$, then $1-P$ is as well. You can replace $1-P$ with $P$ in the method.

$$R(P)=\sqrt{-2\ln(P)} $$


I would like to give some complimentary comments. It is less rigorous mathematically but might be more intuitive in the sense that it involves fewer formulae, which on some level also serves as a proof.

In principle, as Gaussian probability distribution is one-dimensional, one might wonder why one does not provide a generation mechanism with only one random variable (which, for instance, takes its value uniformly on the interval $[0, 1]$). This usually can be carried out by integrating the probability density. (In particular, it is not difficult to show, if the integral of the probability density $f(x)$ equals to $1$, namely, $\int dx f(x) = 1$, then the inverse of the function $y(x)=\int^x dx' f(x') $ provides precisely the desired distribution of $x$, when $y$ is drawn randomly from the interval $[0, 1]$.)

For the present case, the problem is that the related integral does not possess an analytic resulting expression. Therefore, unless you carry out the procedure by using a numerical integral, the above solution is somewhat cumbersome.

Now, an elegant solution is to realize, if one considers two independent variables $x$ and $y$, whose values both satisfy a Gaussian probability distribution. Furthermore, If one views the two variable as the coordinates in a Cartesian coordinate system, then the related variables in the polar coordinate1, namely $R$ and $\Theta$ satisfy a more convenient probability distribution. To be specific, these two variables are still independent ones2; moreover, $\Theta$ satisfies a uniform distribution, integrated to $2\pi$, while $R$ satisfies the distribution $f(R)=Re^{-\frac{R^2}{2}}$ which integrates to 1.

Therefore, the significance of the coordinate transformation resides entirely in the fact that now we DO have an analytic expression for the integral $f(R)dR$, and the Box–Muller method can be viewed as a consequence of this convenience.

Hopefully, it helps to clarify, to a certain extent, the motivation of the method for someone who was also looking for an intuitive explanation like I. ^^


1. Here we use the capital letters for the consistency with the notations employed in other answers.

2. This is intuitive.