Sample Standard Deviation vs. Population Standard Deviation

There are, in fact, two different formulas for standard deviation here: The population standard deviation $\sigma$ and the sample standard deviation $s$.

If $x_1, x_2, \ldots, x_N$ denote all $N$ values from a population, then the (population) standard deviation is $$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2},$$ where $\mu$ is the mean of the population.

If $x_1, x_2, \ldots, x_N$ denote $N$ values from a sample, however, then the (sample) standard deviation is $$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2},$$ where $\bar{x}$ is the mean of the sample.

The reason for the change in formula with the sample is this: When you're calculating $s$ you are normally using $s^2$ (the sample variance) to estimate $\sigma^2$ (the population variance). The problem, though, is that if you don't know $\sigma$ you generally don't know the population mean $\mu$, either, and so you have to use $\bar{x}$ in the place in the formula where you normally would use $\mu$. Doing so introduces a slight bias into the calculation: Since $\bar{x}$ is calculated from the sample, the values of $x_i$ are on average closer to $\bar{x}$ than they would be to $\mu$, and so the sum of squares $\sum_{i=1}^N (x_i - \bar{x})^2$ turns out to be smaller on average than $\sum_{i=1}^N (x_i - \mu)^2$. It just so happens that that bias can be corrected by dividing by $N-1$ instead of $N$. (Proving this is a standard exercise in an advanced undergraduate or beginning graduate course in statistical theory.) The technical term here is that $s^2$ (because of the division by $N-1$) is an unbiased estimator of $\sigma^2$.

Another way to think about it is that with a sample you have $N$ independent pieces of information. However, since $\bar{x}$ is the average of those $N$ pieces, if you know $x_1 - \bar{x}, x_2 - \bar{x}, \ldots, x_{N-1} - \bar{x}$, you can figure out what $x_N - \bar{x}$ is. So when you're squaring and adding up the residuals $x_i - \bar{x}$, there are really only $N-1$ independent pieces of information there. So in that sense perhaps dividing by $N-1$ rather than $N$ makes sense. The technical term here is that there are $N-1$ degrees of freedom in the residuals $x_i - \bar{x}$.

For more information, see Wikipedia's article on the sample standard deviation.

Sample Standard Deviation vs. Population Standard Deviation

Related

Recent Posts