In high school statistics, why does it seem like equations come out of the sky

This is a serious observation from my experience taking AP Statistics. We are given a bunch of unexplained formulas and are expected to make sense of them in real world applications without knowing why they hold. In the algebra classes, I can at least bridge my number sense intuition with the materials to construct an equation, but in statistics we don't use such reasoning to arrive at equations, rather the equations are given and we have to think of them in an applications based context. When I ask my teacher he doesn't seem very receptive to the idea of explaining the equations from pure common sense, instead he talks about what the symbols mean.


Once upon a time an illiterate was recruited as a student by Prestigious University because he was a talented football player and they badly wanted him on the team. On a midterm exam they asked him "When was the War of 1812?" He said "I don't know", and that was undeniably correct, so he passed instead of flunking out, so he could stay on the team, and everyone lived happily ever after.

Math is a subject that everyone is required to take, including those who don't want to learn math. That is a crime. A consequence is that math courses are taught the way that football player's course was taught. That way they can get everybody through. Let's say students are told that variance is $$ \frac1n\sum_{i=1}^n (x_i-\bar x)^2,\text{ where }\bar x = \frac{x_1+\cdots+x_n}{n}. \tag1 $$ (The version with $n-1$ in the denominator instead of $n$ is used ONLY when estimating a population standard deviation based on a sample standard deviation, and the conventional argument for doing that is immensely weaker than most who teach statistics seem to suspect, and at any rate I have a reason for avoiding it here.)

An intelligent student who wants to understand will wonder why the square root of the quantity in $(1)$ is used instead of the mean absolute deviation $$ \frac1n\sum_{i=1}^n |x_i - \bar x|. \tag2 $$ So what happens if the teacher tries to explain that? Students say "Why do we have to learn this? That other instructor doesn't make his students learn this. Learning that is too high a price to pay to get an "A" in this course!" They regard learning the material as a price they pay to get a grade rather than as the thing they showed up for. The reasonable solution is to expel them and teach a course for honest students. But politicians say "That other country over there is full of students who have surpassed ours on standardized tests showing that they've memorized this formula without understanding anything!!! They will slaughter us in a nuclear war next week if we don't make our students do the same!!!!". Everyone including professors believes every word of this and harshly condemns all who disagree and wants to pay higher taxes to make students memorize formula $(1)$ above. And one can't cover everything that will be on the standardized test and still have time to accomodate intelligent students who ask questions about things like this. They must be told to realize that students who want to become illiterate blue collar workers are who the course in statistics is for and curious and intelligent students should learn their place and not misbehave in class while their dumber classmates do what neither they nor anyone else wants to do.

I'll come back later and maybe add something about why $(1)$ rather than $(2)$. The short version: because the variance of a sum of independent random variables is the sum of the variances.

PS: There is also something less nocent involved: People in many fields may want to use the mathematical results derived in statistics without necessarily being able to understand the theory.

PPS: Alright, why is the square root of $(1)$ used as a measure of dispersion, instead of $(2)$? Both have the property that if you multiply $x_i$ by $c$ for $i=1,\ldots,n$, you multiply the measure of dispersion by $|c|$. They also have the property that adding $c$ to each $x_i$ does not alter the measure of dispersion. Those two properties together make them both measures of dispersion.

So find the variance of this set of numbers: $1,2,3$, and also this one: $1,2,3,4$. What's the sum of the two independent random variables? Look at this addition table: $$ \begin{array}{|r|ccc|} \hline & 1 & 2 & 3 \\ \hline 1 & 2 & 3 & 4 \\ 2 & 3 & 4 & 5 \\ 3 & 4 & 5 & 6 \\ 4 & 5 & 6 & 7 \\ \hline \end{array} $$ The sums are: $2,3,3,4,4,4,5,5,5,6,6,7$. Find the variance of the numbers in that list. It will be the sum of the variances found above.

This can be shown to work generally, and that's not hard.

A consequence is that one can apply the central limit theorem to sums like $X_1+\cdots+X_n$ and one knows what the variance of the sum of the many random variables will be.

PPPS: That the sum of the variances equals the variance of the sum, when independent random variables are added, doesn't work when you use $n-1$ rather than $n$. That's what I had in mind when I said I have a reason to avoid that here.


  1. statistics is one of the parts of mathematics that would be useful to the larger majority of citizens. So it is useful to teach it even in non specialized classes.

  2. concepts about probability are very difficult to define. They are easy to explain in a superficial way, but very difficult to formalize.

  3. theorems are quite difficult to prove, they require a very high mathematical knowledge.