In a choir, several people sing the same note at the same time. The sound each person makes consists of a tonic and certain overtones, but much of what we hear is the tonic frequency, so I'd like to concentrate on that.

A very crude approximation is that the tonic is a sine-wave of some frequency, $$ H(t) = A \sin (kt + d) $$ where the constant $k$ determines the frequency, and the phase, $d$, determines the instant at which the signal reaches its peak: if you and I begin singing at slightly different moments, we'll each have our own "d" value. To simplify things, let's assume that we adjust our units of time to make $k = 1$, and adjust our units of sound intensity to make $A = 1$, so that $$ H(t) = \sin(t + d). $$

By the principle of superposition, the sound made by several singers looks like $$ C(t) = \sum_{i = 1}^n \sin (t + d_i) $$ where the $d_i$ are (I would guess) uniformly distributed random variables in the interval, say, $0 \le d_i \le 2\pi$.

It seems to me that for any given time $t_0$, the values $A\sin(t_0 + d_i)$ are distributed between $-1$ to $1$, with a distribution that's symmetric about $0$: any given singer's sine-wave is just as likely to be in the "negative" half-cycle as in the positive one, etc.

That is to say: it appears that the expected value of the sound produced is zero.

I recognize that this isn't exactly the right question to ask, for this looks at the expectation over all sets of phases rather than for a specific choir. One might form a choir (i.e., a set of phases) which was nice and loud, and then offset everyone by a half-cycle and get another nice loud choir, but the sum of the two choirs would be zero, which is no problem: the individual choirs were plenty loud.

I guess the question I have is then this:

What's the expected maximum of $|C(t)|$, on the interval $0 \le t \le 2\pi$, with the expectation taken with respect to iid uniform choices of the phases $d_i$? Experiments in matlab suggest to me that it might be something around $0.9 \sqrt{n}$ (where $0.9$ must surely come from some weird combination of constants involving $\pi$, etc.)

The peculiar thing is that a 100-person choir seems to me to be much louder than just 10 times the loudness of a single singer. Given the logarithmic nature of perception for most senses, this seems to wildly contradict the estimate I gave above.

Can someone suggest some insight into this?

[Let's assume, for the sake of argument, that the choir is arranged in a circle around me, the conductor, so that if everyone sings at exactly the same moment, the sounds all reach my (single) ear at exactly the same moment, OK?]

I realize that the larger question of why choirs work is a combination of perception, physics, math, and probably some other things, but the math question here is about the expected amplitude of a sum of random-phase sine-waves, and that's what I'm hoping to have answered here on MSE.


A better approach than the one I was taking in the comments is to see the whole process as a question about random variables in the plane. Let us define $$ X_j := \begin{bmatrix}\cos d_j \\ \sin d_j\end{bmatrix}\quad \text{and} \quad S_n := \sum_{j=1}^{n} X_j,$$ where $d_j$ is a family of iid uniform random variables on $[0, 2 \pi]$. Then, $S_n$ is a random vector in $\mathbb{R}^2$ with first and second coordinates given respectively by $$ A_n = \sum_{i=1}^n \cos d_j \quad \text{and} \quad B_n = \sum_{i=1}^n \sin d_j. $$ As mentioned in the comments, we have that $$ C(t) = \sum_{i = 1}^n \sin (t + d_i) = \sin t \Big(\sum_{j=1}^n \cos d_j\Big) + \cos t \Big( \sum_{j=1}^n \sin d_j \Big) = \langle (\cos t, \sin t), (B_n, A_n) \rangle, $$ implying that $$ \sup_{t \in [0,1]} |C(t)| = \sqrt{A_n^2 + B_n^2} = \lVert S_n \rVert $$ Thus, you just want to understand well the behavior of $S_n$, a sum of iid random vectors. Notice that the marginals of $X_j$ have the same distribution, and that $$ \mathbb{E}[\cos d_j] = \mathbb{E}[\sin d_j] = 0 \quad \text{and} \quad \mathbb{E}[\cos^2 d_j] = \mathbb{E}[\sin^2 d_j] = \frac12, $$ by applying the expected value operator to $\sin^2 d_j + \cos^2 d_j = 1$. Also, we have $$ \mathbb{E}[\cos d_j \sin d_j] = \frac12 \mathbb{E}[\sin (2d_j)] = 0. $$ Thus we can apply the multidimensional Central Limit Theorem for the sum $S_n$, and see that $$ \frac{S_n}{\sqrt{n}} \to Z $$ in distribution, where $Z$ is a normal of mean $0$ and covariance matrix $\Sigma = \begin{bmatrix}\frac12 & 0 \\ 0 & \frac12\end{bmatrix}$. I am not very familiar with these multidimensional results but I am quite confident that from here you should be able to derive precise estimates on the distribution of $\lVert S_n \rVert$ and its moments.


Let me try a possible answer, maybe someone else can jump in and hopefully confirm. I was posting a very similar question here and then realized the discussion about this question here.

It is actually true that the expected value of one single singer $\sin(t+d_i)$ equals zero; the same therefore also holds for the expected value of the sum. As pointed out by a comment to my original question, one needs to distinguish between the expected value and the actual realization. A fact also necessary to consider is that your ear probably does not capture sound waves at just one infinitesimally small point, but across an area of finite small size. Each point within the area acquires the intensity (or energy) of the sound wave arriving at it, i.e., its square (you cannot subtract a negative sine wave from a positive one hitting the plane at a different position). Integration of the energy across that area provides the signal that is actually converted to a nerve signal which gives you a hearing sensation. So what actually happens is that you indeed might have total destructive interference at one or more tiny spots on the hearing plane as a current realization of the random experiment, but the other points account for that, and on average, you receive the same intensity, even if you move your head slightly to the left or to the right. It might be necessary to consider the wavelength and the true size of the sound intensity sensitive area (which are of quite different magnitude), but there is an overlap of dozens of sound waves with different phase at each point, so a tiny shift on the hearing plane might result in a pretty big phase difference of the overall acquired sine.

I think that makes sense from a mathematical point of view; it would be great if someone could confirm that from a physiological point of view as well.