Aggregating standard deviation to a summary point

Solution 1:

General Solution

To compute mean, variance, and standard deviation you only need to keep track of three sums $s_0, s_1, s_2$ defined as follows for a set of values $X$:

$$(s_0, s_1, s_2) = \sum_{x \in X} (1, x, x^2)$$

In English, $s_0$ is the number of values, $s_1$ is the sum of the values, and $s_2$ is the sum of the square of each value. Given these sums, we can now derive mean (average) $\mu$, variance (population) $\sigma^2$, and standard deviation (population) $\sigma$:

$$\mu = \frac{s_1}{s_0} \qquad \sigma^2 = \frac{s_2}{s_0} - \left(\frac{s_1}{s_0}\right)^2 \qquad \sigma = \sqrt{\frac{s_2}{s_0} - \left(\frac{s_1}{s_0}\right)^2}$$

In English, the variance is the average of the square of each value minus the square of the average value.

Your particular case

You have $s_0, \mu, \sigma$, so you need to compute $s_1$ and $s_2$ by solving the above for those variables:

$$s_1 = s_0\mu \qquad s_2 = s_0\left(\mu^2 + \sigma^2\right)$$

Once you have $s_0, s_1, s_2$ for each data set, aggregation is just a matter of adding the corresponding sums together and deriving the desired aggregate values from those sums.

Variance Equation Derivation

We start with the standard equation for variance (population) and go from there:

$$\sigma^2 = \frac{1}{n}\sum_{x \in X} \left(x - \mu\right)^2 = \frac{1}{s_0}\sum_{x \in X} \left(x - \frac{s_1}{s_0}\right)^2$$

$$= \frac{1}{s_0}\sum_{x \in X} \left(x^2 - 2x\frac{s_1}{s_0} + \left(\frac{s_1}{s_0}\right)^2\right) = \frac{1}{s_0}\sum_{x \in X} x^2 - 2\frac{s_1}{s_0^2}\sum_{x \in X} x + \frac{s_1^2}{s_0^3}\sum_{x \in X} 1 $$

$$= \frac{1}{s_0}(s_2) - 2\frac{s_1}{s_0^2}(s_1) + \frac{s_1^2}{s_0^3}(s_0) = \frac{s_2}{s_0} - 2\frac{s_1^2}{s_0^2} + \frac{s_1^2}{s_0^2} = \frac{s_2}{s_0} - \left(\frac{s_1}{s_0}\right)^2 $$

Solution 2:

You say you take the average of all the averages, but I notice that you have a sample count column. Are these averages over different sample sizes? If so, then you would probably want a weighted average for your aggregate average: $$\text{Aggregate Average} =\frac{\sum_i (\text{sample size})_i(\text{average})_i}{\sum_i (\text{sample size})_i}$$

But without knowing more about the data, I cannot say for sure.

Now standard deviation is just the square root of the average variance. Over entire populations, it is defined by $$\sigma = \sqrt\frac{\sum_{i=1}^N(x_i - \bar x)^2}{N}$$ where $\bar x = \left(\sum_{i=1}^N x_i\right) /N$ is the average. Note that the variance $\sigma^2$ is just an average itself. So you can combine them just like you do other averages. Assuming that I am right about needing to include sample sizes, you want: $$ \text{Aggregate } \sigma^2 = \frac{\sum_i (\text{sample size})_i\sigma^2_i}{\sum_i (\text{sample size})_i}$$ Then you just take the square root to get the aggregate standard deviation. (If I am wrong about needing to include sample size, then you use the same equation with each sample size $= 1$.)