Recursive formula for variance

Recall that, for every $n\geqslant1$, $$ \bar x_n=\frac1n\sum_{k=1}^nx_k, $$ and $$ \bar\sigma^2_n=\frac1n\sum_{k=1}^n(x_k-\bar x_n)^2=\frac1n\sum_{k=1}^nx_k^2-(\bar x_n)^2. $$ Hence simple algebraic manipulations starting from the identities $$ (n+1)\bar x_{n+1}=n\bar x_n+x_{n+1}, $$ and $$ (n+1)(\bar\sigma^2_{n+1}+(\bar x_{n+1})^2)=n(\bar\sigma^2_n+(\bar x_n)^2)+x_{n+1}^2, $$ lead to $$ \bar x_{n+1}=\bar x_n+\frac{x_{n+1}-\bar x_n}{n+1}, $$ and $$ \bar\sigma^2_{n+1}=\bar\sigma^2_n+(\bar x_n)^2-(\bar x_{n+1})^2+\frac{x_{n+1}^2-\bar\sigma^2_n-(\bar x_n)^2}{n+1}. $$ Thus, $(n,\bar x_n,x_{n+1})$ yield $\bar x_{n+1}$ and $(n,\bar\sigma^2_n,\bar x_n,\bar x_{n+1},x_{n+1})$ yield $\bar\sigma^2_{n+1}$.


There are two problems in the preceding answer, the first being the formula for the variance is incorrect(see the formula below for the correct version) and the second is that the formula for the recursion ends up subtracting large, nearly equal, numbers.

The definition for unbiased estimates of mean($\bar x$) and variance($\sigma^2$) for a sample of size n are: $$ \bar x_n=\frac1n\sum_{k=1}^nx_k, $$ and $$ \bar\sigma^2_n=\frac1{n-1}\sum_{k=1}^n(x_k-\bar x_n)^2 $$

Define the recursion variables

$$ M_n = n \bar x_n=\sum_{k=1}^nx_k, $$ and $$ S_n = (n-1)\bar\sigma^2_n=\sum_{k=1}^n(x_k-\bar x_n)^2 $$

The recursion relation for $M_{n+1}$ is obvious $$ M_{n+1} = M_n + x_{n+1} $$ and the recursion relation for $S_n$ is obtained via $$ S_{n+1} = (x_{n+1}-\bar x_{n+1})^2+\sum_{k=1}^n(x_k-\bar x_{n+1})^2\phantom{XXXXXX}\\ \phantom{S_{n+1}} = (x_{n+1}-\bar x_{n+1})^2+\sum_{k=1}^n(x_k-\bar x_n+\bar x_n-\bar x_{n+1})^2\\ \phantom{S_{n+1}XXXXXXXXXX} = (x_{n+1}-\bar x_{n+1})^2+\sum_{k=1}^n(x_k-\bar x_n)^2+2(\bar x_n-\bar x_{n+1})\sum_{k=1}^n(x_n-\bar x_n) + \sum_{k=1}^n(\bar x_n -\bar x_{n+1})^2\\ $$ And since $$S_n = \sum_{k=1}^n(x_k-\bar x_n)^2$$ $$\sum_{k=1}^n(\bar x_n-\bar x_{n+1})^2 = n(\bar x_n-\bar x_{n+1})^2$$ and $$\sum_{k=1}^n(x_k-\bar x_n) = 0$$ this simplifies to $$ S_{n+1} = S_n+(x_{n+1}-\bar x_{n+1})^2 +n(\bar x_n -\bar x_{n+1})^2 $$

Now, this recursion relation has the nice property that it $S_n$ a sum of squared terms, and thus cannot be negative. Written in terms of $M_n$ and $S_n$, the recursion relations are: $$ M_{n+1} = M_n + x_{n+1} $$ $$ S_{n+1} = S_n+\left(x_{n+1}-\frac{M_n+x_{n+1}}{n+1}\right)^2 +n\left(\frac{M_n}{n} -\frac{M_n+x_{n+1}}{n+1}\right)^2 $$ and we can further simplify the recursion relation for $S_n$ to \begin{eqnarray} S_{n+1} &= S_n + \left(\frac{n x_{n+1}-M_n}{n+1}\right)^2+n\left(\frac{M_n-n x_{n+1}}{n(n+1)}\right)^2\\ &=S_n+ \left(1+\frac1n\right)\left(\frac{n x_{n+1}-M_n}{n+1}\right)^2\\ &=S_n+ \frac{(n x_{n+1}-M_n)^2}{n(n+1)} \end{eqnarray}

So we have the simple recursion relations: $$ M_{n+1} = M_n + x_{n+1} $$ $$ S_{n+1} = S_n + \frac{(n x_{n+1}-M_n)^2}{n(n+1)}$$

with the mean given by $$\bar x_n = \frac1n M_n$$ and the unbiased estimate of the variance is given by $$\sigma_n^2 = \frac1{n+1}S_n$$.


Here are the iterative formulas (with derivations) for the population ($N$ normalized) and sample ($N-1$ normalized) standard deviations, which express the $\sigma_{n+1}$ ($s_{n+1}$ for sample) for the $n+1$ value set in terms of $\sigma_{n}$ ($s_{n}$ for sample), $\bar x_{n}$ of the $n$ value set plus the new value $x_{n+1}$ added to the set.

Essentially we need to find:

$$\bar x_{n+1} = f(n, \bar x_n, x_{n+1})$$ and $$\sigma_{n+1} = g(n, \sigma_n, \bar x_n, x_{n+1})$$

Derivation for the Average

For both cases, the average for $n\geqslant1$ is,

for $n$ values: $$ \bar x_n=\frac1n\sum_{k=1}^nx_k $$

for $n+1$ values: $$ \bar x_{n+1}=\frac1{n+1}\sum_{k=1}^{n+1}x_k = \frac1{n+1}(n\bar x_n + x_{n+1}) \leftarrow f(n, \bar x_n, x_{n+1}) $$

Derivation for the Standard Deviation

The standard deviation formulas for population and sample are:

\begin{aligned} \sigma_{n} &= \sqrt {\frac1{n} \sum_{k=1}^{n}(x_k - \bar x_{n})^2 } && \textit{for} \textbf{ population } \textit{Standard Deviation}\\ \\ s_{n} &= \sqrt {\frac1{n-1} \sum_{k=1}^{n}(x_k - \bar x_{n})^2 } && \textit{for} \textbf{ sample } \textit{Standard Deviation } \\ \end{aligned}

To consolidate the derivations for both population and sample formulas we'll write the standard deviation using a generic factor $\alpha_{n}$ and replace it at the end to get the population and sample formulas.

with:

\begin{equation} \alpha_{n} = \begin{cases} n & \textit{for} \textbf{ population } \textit{Standard Deviation} \\ n-1 & \textit{for} \textbf{ sample } \textit{Standard Deviation } \\ \end{cases} \end{equation}

the equation for the standard deviation for the $n$ values can be written as:

\begin{equation} \tag{1} \begin{aligned} \alpha_{n}\sigma^2_{n} & = \sum_{k=1}^{n}(x_k - \bar x_{n})^2 \\ & = \sum_{k=1}^{n}\big(x_k^2 - 2 x_k \bar x_n + (\bar x_{n})^2\big) \\ & = \sum_{k=1}^{n}x_k^2 - 2 \bar x_n \sum_{k=1}^{n}x_{k} + (\bar x_{n})^2\sum_{k=1}^{n}1 \\ & = \sum_{k=1}^{n}x_k^2 - 2 \bar x_n n \bar x_n + n (\bar x_{n})^2 \\ & = \sum_{k=1}^{n}x_k^2 - n (\bar x_{n})^2 \end{aligned} \end{equation}

Thus, the same equation for $n+1$ values is:

\begin{equation} \begin{aligned} \alpha_{n+1}\sigma^2_{n+1} & = \sum_{k=1}^{n+1}(x_k-\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n+1}x_k^2 - (n+1)(\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2 - (n+1)(\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2 - (n+1) \big(\frac1{n+1}(n\bar x_{n} + x_{n+1}) \big)^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2 - \frac1{n+1} \big(n^2(\bar x_{n})^2 + 2 n \bar x_{n} x_{n+1} + (x_{n+1})^2 \big) \\ \end{aligned} \end{equation}

from the equation $(1)$ we substitute $\sum_{k=1}^{n}x_k^2$ with $\alpha_{n}\sigma^2_{n} + n (\bar x_{n})^2$ and get:

\begin{equation} \begin{aligned} \alpha_{n+1}\sigma^2_{n+1} & = \alpha_{n}\sigma^2_{n} + n (\bar x_{n})^2 + (x_{n+1})^2 - \frac1{n+1} \big(n^2(\bar x_{n})^2 + 2 n \bar x_{n} x_{n+1} + (x_{n+1})^2 \big) \\ \end{aligned} \end{equation}

arranging the terms and simplifying we get:

$$ \sigma_{n+1} = \sqrt { \Big( \sigma^2_{n} + \frac{n}{n+1} \frac1{\alpha_n} (\bar x_n - x_{n+1})^2 \Big) \frac{\alpha_{n}}{\alpha_{n+1}} } \leftarrow g(n, \sigma_n, \bar x_n, x_{n+1}) $$

Replacing the $\alpha$ values, the specific iterative formulas for population and sample standard deviations are:

\begin{equation} \begin{aligned} \sigma_{n+1} &= \sqrt{ \Big( \sigma^2_{n} + \frac{1}{n+1}(\bar x_n - x_{n+1})^2 \Big) \frac{n}{n+1} } &&\textit{population STD} \\ \\ s_{n+1} &= \sqrt{ \Big( s^2_{n} + \frac{n}{n^2-1}(\bar x_n - x_{n+1})^2 \Big) \frac{n-1}{n} } &&\textit{sample STD} \\ \end{aligned} \end{equation}


I ended up using this incremental approach:

function mean(array) {
  var i = -1, j = 0, n = array.length, m = 0;
  while (++i < n) if (a = array[i]) m += (a - m) / ++j;
  return j ? m : undefined;
}

function variance(array, mean_value) {
  if (!mean_value) return undefined;
  var i = -1, j = 0, n = array.length, v = 0;
  while (++i < n) {
    a = Math.pow((array[i] - mean_value), 2)
    v += (a - v) / ++j;
  }
  return v * (n/(n-1));
}