Why does Pearson's chi-squared test divide by the mean and not the variance?
I am wondering why in Pearson's chi-squared test, the divisor of each element in the sum is the matching expectation and not the matching variance.
As I understand it, the test works by standardizing each normal variable before summing, so the results set can be tested against the chi-squared distribution which deals with a sum of squares of standard normal random variables.
The way a normal random variable is standardized is by subtracting the expectation and dividing by the standard deviation. So, in Pearson's test, this should give the variance in the divisor of each element, not the expectation.
Solution 1:
Intuition/informal proof: The expected value is equal to the variance, so when you divide by the expected value you are in fact dividing by the variance, as you thought you should. If you think of it in terms of counts that follow a Poisson distribution this is natural, since the mean and variance of a $\operatorname{Poisson}(\lambda)$ distribution are both $\lambda$.
For a formal proof, check out MIT's OpenCourseWare.
Great question!
Solution 2:
The mathematical proof shared by Jonathan Christensen in the answer below is great.
Here is my intuitive interpretation:
I was also deeply confused when every "simple" explanation out there references the Poisson distribution which is intuitively not right because the underlying process should be a Binomial. I too initially thought that the chi-squared test makes more sense if the divisor is $np_iq_i$ instead of $np_i (i.e. Ei)$.
After reading the proof, I now understand it much better. Long story short, we must not interpret each cell's calculation individually because doing so would cause confusion instead of giving the right intuition. The chi-squared test applies Pearson's theorem as a whole. Did you notice that we have to sum all the cells and not allowed to pick and choose cells (e.g. remove columns/rows that are not of our interest)? The statistic $\sum\dfrac{E_i - O_i}{E_i}$ only converges to $\chi^2$ distribution if all the cells (mutually exclusive and collectively exhaustive) are added together.
Individually, each cell's variance is $np_iq_i$, but all the cells are not independent of each other because they sum up to a total so that by knowing the first n-1 cells, the final cell value is known. That is, the covariance between the cells is not zero. It is actually negative because a large value in one cell means that the other cells need to be smaller to compensate. Following the proof, when you sum up all the cells, the resulting distribution needs to take the covariance into account. The end result is such that (with full two pages of maths) $\sum\dfrac{E_i - O_i}{E_i}$ converges to $\chi^2$ distribution with the degree of freedom as described by the theorem. It is a full integral across all the cells. Removing any would break the proof and render the theorem not applicable.
In summary, don't take the "intuitive" interpretation. There is literally no mention of Poison distribution in the proof. Think of the chi-squared statistic as a single statistic instead of the sum of individual statistics.
Thanks Daniel