How do I sample a random rotation matrix in $n$ dimensions?

Let $M$ be a $n\times n$-dimensional matrix. $M$ is a rotation matrix if $\|Mx\|_2 = \|x\|_2$ for all $x \in \mathbb{R}^n$. I want to sample uniformly at random from the rotations, in the following sense. Let $\mathcal{S}=\{x: \|x\|_2=1\}$ denote the perimeter of a sphere of radius $1.$ I want a distribution on $M$ so that, if you fix any point $x \in \mathcal{S}$, then $Mx$ will be uniformly distributed on $\mathcal{S}$.

How can I sample a matrix $M$ from this distribution? I am looking for an algorithmic procedure I can use to sample such a matrix randomly.

I believe this corresponds to sampling uniformly at random from $O(n)$ (the orthogonal group). I am fine with either a procedure for sampling uniformly at random from $O(n)$ or $SO(n)$; it should be easy to convert between the two of them.


Here's a fun indirect way to do it: sample a random $n \times n$ real matrix $M$ from the Gaussian orthogonal ensemble. Explicitly this means that the diagonals $M_{ii}$ are independent Gaussians $N(0, \sqrt{2})$ and the off-diagonals $M_{ij} = M_{ji}$ are independent Gaussians $N(0, 1)$ subject to the constraint that $M$ is symmetric. Now compute the eigendecomposition $M = UDU^{\dagger}$. Then $U \in O(n)$ will be distributed according to Haar measure. (Maybe it would be better to use the QR decomposition, I don't know.)

There are efficient methods for both generating a bunch of independent Gaussians and for computing eigendecompositions in, say, Octave or MATLAB so this can be done with very little code, although I don't know what the asymptotics are or how much better it's possible to do. In Octave it is something like this (untested):

A = randn(n); # generates an nxn matrix all of whose entries are N(0, 1)
M = (A + A.') / sqrt(2); # symmetrizes A, now a sample from GOE
[V, E] = eig(M); # computes eigenvectors V and eigenvalues E of M

As a more direct (and, I suspect, much more computationally efficient) version of Qiaochu Yuan's approach, we can bypass the need for a diagonalization with a slightly simpler ensemble, and use the fact that the Graham-Schmidt process is $O(n)$-invariant.

Define a random $n\times n$ matrix $M$ where each entry is an iid. Gaussian. This ensemble is $O(n)$ invariant, in the sense that $p_M(A)=p_M(OA)$, where $p_M$ is the probability distribution of $M$ and $O\in O(n)$. To avoid numerical issues, you can freely reject matrices with $|\det(M)|<\epsilon$ for some fixed $\epsilon$, and still obtain an $O(n)$-invariant ensemble since the absolute value of the determinant is $O(n)$-invariant.

Applying the Graham-Shmidt process $\Gamma$ to $M$ yields a new random matrix $\Gamma(M)\in O(n)$. Since $\Gamma(OA)=O\Gamma(A)$ for $O\in O(n)$, $\Gamma(M)$ is also $O(n)$ invariant, which is to say it is uniformly distributed w.r.t. the Haar measure on $O(n)$.