Why use a Kalman filter instead of keeping a running average

YES it is oversimplified example, more misleading than educating.

If so, what's an example where a running average doesn't suffice?

Any case when signal is changing.

Imagine moving vehicle. Calculating average means we assume signal value from any moment in time to be equally important. Obviously it is wrong. Intuition says, the last measurement is more reliable than the one from an hour before.

A very nice example to experiment with is of the form $\frac{1}{sT + 1}$. It has one state, so the equations won't get complicated.

In discrete time it could look like this:

x[n] = Ax[n-1] + Bu[n] + w[n]
y[n] = Cx[n] + v[n]
A = 0.99    B=1     C=1

There's the code that uses it (I'm sorry it's Matlab, I didn't use Python recently):

%% Initialize space
N = 100;               % nr of iterations
x = zeros(N,1);
u = zeros(N,1);
yv = zeros(N,1);

xprio = zeros(N,1); % a-priori     xk|k-1
xpost = zeros(N,1); % a-posteriori xk|k
Pprio = zeros(N,1);
Ppost = zeros(N,1);
K = zeros(N,1);

%------------------------ Variables to play with:
modelError = -0.04;     % relative model error
Q = 0.01;               % std. deviation of disturbance
R = 0.1;                % std. deviation of measurement noise
x(1) = 0.5;             % initial state of plant
xpost(1) = 1;           % initial estimate (state of Kalman filter)
Ppost(1) = 0.001;       % initial error estimate (state of Kalman filter)
%------------------------

% Plant
Areal = 0.99;
B = 1;
C = 1;
% Model of plant
Amodel = Areal*(1+modelError); % model never describes reality perfectly

% Generate noise
w = Q*randn(N,1);
v = R*randn(N,1);

%% Iterate
for k = 2:N
    % simulate plant
    x(k) = Areal*x(k-1) + B*u(k-1) + w(k);
    % measurement
    yv(k) = C*x(k) + v(k);

    % prediction: predict current state from previous state and control
    xprio(k) = Amodel*xpost(k-1)+B*u(k-1);
    Pprio(k) = Amodel*Ppost(k-1)*Amodel' + Q;

    % correction: use measurements with proper weight (K)
    K(k) = Pprio(k)*C * inv(C*Pprio(k)*C' + R);
    xpost(k) = xprio(k) + K(k)*(yv(k) - C*xprio(k));
    Ppost(k) = (1 - K(k)*C)*Pprio(k);
end

%% Plot results
figure;
subplot(2,1,1);
plot(x,'k');
hold
plot(yv,'kx');
plot(xpost,'r');
legend('x real','x measure','x estimated');

% Important to see how K changes with time
subplot(2,1,2);
plot(K,'b')
legend('K');

There are some tips:

  • Always set Q and R greater than zero.
    Case $Q = 0$ is VERY BAD example. You say to the filter: "there is no disturbance acting on the plant", so after a while the filter will belief only to its predictions based on model rather than looking at measurements. Mathematically speaking $K_k \to 0$. As we know models don't describe reality perfectly.
  • Experiment with some model inaccuracy - modelError
  • Change initial guess of the state (xpost(1)) and see how fast it converges for different Q, R, and initial Ppost(1)
  • Check how the filter gain K changes over time depending on Q and R

In fact, they are the same thing in certain sense, I will show your something behind Kalman filter and you will be surprised.

Consider the following simplest problem of estimation. We are given a series of measurement $z_1, z_2, \cdots, z_k$, of an unknown constant $x$. We assume the additive model \begin{eqnarray} z_i= x + v_i, \; i=1,2, \cdots, k ~~~~~~~~~~~ (1) \end{eqnarray} where $v_i$ are measurement noises. If nothing else is known, then everyone will agree that a reasonable estimate of $x$ given the $k$ measurements can be given by \begin{eqnarray} \hat{x}_k= \frac{1}{k} \sum_{i=1}^{k} z_i ~~~~~~~~~~~ ~~~~~~~~~~~ (2) \end{eqnarray} this is average.

Now we can re-write above eq.(2) by simple algebraic manipulation to get \begin{eqnarray} \hat{x}_k= \hat{x}_{k-1} + \frac{1}{k} (z_k-\hat{x}_{k-1}) ~~~~~~~~~~~ (3) \end{eqnarray} Eq.(3) which is simply Eq.(2) expressed in recursive form has an interesting interpretation. It says that the best estimate of $x$ after $k$ measurement is the best estimate of $x$ after $k-1$ measurements plus a correction term. The correction term is the difference between what you expect to measure based on $k-1$ measurement, i.e., and what you actually measure $z_k$.

If we label the correction $\frac{1}{k}$ as $P_k$, then again simply algebraic manipulation can write the recursive form of $P_k$ as \begin{eqnarray} P_k=P_{k-1}-P_{k-1}(P_{k-1}+1)^{-1}P_{k-1} ~~~~~~~~~~~ (4) \end{eqnarray}

Believe it or not, Eqs.(3-4) can be recognized as the Kalman filtering equations for this simple case.

Any discussion is welcomed.

Reference:

Explaining Filtering (Estimation) in One Hour, Ten Minutes, One Minute, and One Sentence by Yu-Chi Ho


To give some flavor, see this list of books:

http://www.cs.unc.edu/~welch/kalman/kalmanBooks.html

I have Grewal+Andrews with MatLab, also Grewal+Weill+Andrews about GPS.

That is the fundamental example, GPS. Here is a simplified example, I interviewed for a job where they were writing software for keeping track of all trucks going in and out of a huge delivery yard, for Walmart or the like. They had two types of information: based on putting an RFID device in each truck, they had pretty good information about the direction each truck was going with measurements possible many times per second, but eventually growing in error, as does any essentially ODE approximation. On a much longer time scale, they could take the GPS position of a truck, which gives a very good unbiased location but has a large variance, you get position within 100 meters or something. How to combine these? That's the main use of the Kalman filter, when you have two sources of information giving roughly opposite types of error. My idea, which i would have told them if they had paid me, was to place a device on each semi where the cab meets the trailer, giving the current turning radius. This could have been integrated to give very good short-time information about the direction the truck was heading.

Well, that is what they do with almost anything moving nowadays. The one I thought was cute was farms in India, keeping track of where tractors were. The moving body does not need to be moving rapidly to bring about the same questions. But, of course, the first major use was the NASA Apollo project...My father met Kalman at some point. Dad worked mostly on navigation, initially missiles for the Army, later submarines for the Navy.


A running average is one kind of Kalman filter. Following the notation in your first link $\hat{X}_k=K_kZ_k+(1-K_k)\hat{X}_{k-1}$, a running average sets $K_k=\frac 1k$. If your underlying model is that the parameter of interest doesn't change with time, it is what you get. Other forms are needed if $X$ changes with time.