Incremental averaging

Is there a way to incrementally calculate (or estimate) the average of a vector (a set of numbers) without knowing their count in advance?

For example you have a = [4 6 3 9 4 12 4 18] and you want to get an estimate of the average but you don't have all the values at hand so you want to have a running average without keeping all the previous values available.


You need to keep at least two pieces of information: the number of terms so far and the running mean (or something equivalent to it).

Let's suppose the $n$th component of the vector is $a_n$ and the running mean up to this is $m_n$ so $$m_n= \frac{1}{n}\sum_{i=1}^n a_i.$$

Starting with $m_0=0$, you can use the obvious single pass $$m_n = \frac{(n-1)m_{n-1}+a_n}{n}$$ but precision errors are likely to be smaller if you use the equivalent $$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}.$$


Intuition

It's an easy question and @Henry basically answered it. However, I think it would be nice to add some intuition on the second equation:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

The idea is to represent the new value $x_k$ value by a part that is equal to the previous mean $\mu_{k-1}$ plus the remaining part $x_k-\mu_{k-1}$. The new mean $\mu_k$ we then contains $k$ times the previous mean and one time the remaining part of the new value, divided by the total count. Okay, so how to get there?

Derivation

Let's say you already have the mean $\mu_{k-1}$ for the elements $x_0,...,x_{k-1}$. Of course, we can easily incorporate an additional value $x_k$ by undoing the division, adding the new value, and redoing the scaling (but the new count):

$$\mu_k=\frac{(k-1)*\mu_{k-1}+x_k}{k}$$

Now, let's represent the new value by the previous mean and the difference to the previous mean:

$$x_k=\mu_{k-1}+(x_k-\mu_{k-1})$$

Putting that into our first equation:

$$\mu_k=\frac{(k-1)*\mu_{k-1}+\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

Look how we now got $k$ times the previous mean:

$$\mu_k=\frac{k*\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

The nice thing is that we don't have to undo and redo the division on the previous mean anymore:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

As you can see, we still have to do the division on the difference of the new value to the previous mean. Since the new value comes in "sum-space" already, we don't have to undo anything on it, just divde by the total count.

Application

An application of this form of incremental mean can be found in the field of reinforcement learning where a value function is averaged over many experienced rewards. In this setting, you can think of $x_k-\mu_{k-1}$ as a temporal-difference error term. Since we usually don't know the total number of experiences here, we multiply by a small learning rate rather than dividing by $k$.