Why are randomly drawn vectors nearly perpendicular in high dimensions

A random uniform unit vector is $X/\|X\|$ where $X$ is standard normal, thus the scalar product of two independent unit vectors $U$ and $V$ is $\langle U,V\rangle=\langle X,Y\rangle/(\|X\|\cdot\|Y\|)$ where $X$ and $Y$ are independent and standard normal. When $n\to\infty$, by the law of large numbers, $\|X\|/\sqrt{n}\to1$ almost surely and $\|Y\|/\sqrt{n}\to1$ almost surely, and by the central limit theorem, $\langle X,Y\rangle/\sqrt{n}$ converges in distribution to a standard one-dimensional normal random variable $Z$.

Thus, $\sqrt{n}\cdot\langle U,V\rangle\to Z$ in distribution, in particular, for every $\varepsilon\gt0$, $P(|\langle U,V\rangle|\geqslant\varepsilon)\to0$. In this sense, when $n\to\infty$, the probability that $U$ and $V$ are nearly orthogonal goes to $1$.

Likewise, $k$ independent uniform unit vectors are nearly orthogonal with very high probability when $n\to\infty$, for every fixed $k$.


Here is one way to reason, chosen for simplicity of calculations: Consider the unit vector $e=(1,0,0,\ldots,0)\in\mathbb R^n$. One way to measure how 'orthogonal' $e$ is to other vectors is to calculate the average of $(e\cdot x)^2$ as $x$ ranges over the unit sphere. If $S$ denotes the surface measure on the unit sphere corresponding to (normalized) area, then $$ \int |e\cdot y|^2 dS(y) =\int |y_1|^2 dS(y)=\frac{1}{n}\int \sum_{j=1}^n |y_j|^2 dS(y)=\frac{1}{n}. $$ Thus, in this sense, vectors are generally 'more' orthogonal in higher dimensional spaces.

Edit: This line of reasoning follows closely the argument given by JyrkiLahtonen in the comments above, as one sees by considering a random $\mathbb R^n$-valued vector $Y$, uniformly distributed on the unit sphere. If we consider the random variable $e\cdot Y$, then $$ E \; e\cdot Y=\int e\cdot y \;dS(y)=0, $$ because $S$ in invariant under the transformation $y\mapsto -y$. On the other hand $$ V(e\cdot Y)=\int |e\cdot y|^2 dS(y) =\frac{1}{n}, $$ as shown above. Therefore, intuitively, $e\cdot Y$ is small when $n$ is large. Rigorously, we can employ Chebyshev's inequality to obtain $$ P(|e\cdot Y|\geq \epsilon)\leq \frac{1}{n\epsilon^2}. $$


Here's an excerpt from Lecture 2 of Keith Ball's An Elementary Introduction to Modern Convex Geometry:

excerpt

See the link for details, but as the image shows, the measure of a spherical cap cut off by a hyperplane which is $\varepsilon$ away from the origin is bounded above by the ratio between volume of a ball of radius $\sqrt{1-\varepsilon^2}$ and the volume of a ball of radius 1, that is, $(1-\varepsilon^2)^{n/2}$. So if $X$ and $Y$ are independent random unit vectors (uniform on the sphere), then $$ P(\langle X,Y\rangle < \varepsilon) \ge 1 - (1-\varepsilon^2)^{n/2} \ge 1 - e^{-n\varepsilon^2/2} $$ which is close to $1$ when $n$ is large (if $\varepsilon$ is fixed, at least).


In statistics orthogonality is deeply linked to correlation. Random vectors are orthogonal when the values they contain are uncorrelated, meaning that in the $x_i$ values of $\vec{v_1}$ and $\vec{v_2}$ do not appear common patterns. The higher the dimension is the higher is $i$ and so the probability is higher that $x_{i}$ value of $\vec{v_1}$ and the $x_i$ values of $\vec{v_2}$ have no common pattern, because the higher the dimension the higher the possibilities (in an urn problem increase the number of balls ...). Equivalent to the statement that $\vec{v_1}$ and $\vec{v_2}$ have no common patterns is the statement that they are uncorrelated what is equivalent to the statement that they are orthogonal.

So the key to understand why randomly drawn vectors are perpendicular in high dimensions is the deep link between orthogonality and statistical correlation :

https://stats.stackexchange.com/questions/171324/what-is-the-relationship-between-orthogonal-correlation-and-independence

This is linked to a more philosophical question if real randomness actually exists. There are strong hints that this is not the case, i.e. nonlinear dynamical systems (chaotic systems) are able to generate white noise in a deterministic way : https://ieeexplore.ieee.org/document/1236865 - Chaotic maps generating white noise