simple example of recursive least squares (RLS)

I'm vaguely familiar with recursive least squares algorithms; all the information about them I can find is in the general form with vector parameters and measurements.

Can someone point me towards a very simple example with numerical data, e.g. $y = \hat{m}x+\hat{b}$ for scalar measurements x and y, and unknown parameters m and b? I need to understand this well before going to vector examples.

edit: I've found literally dozens of papers and lecture notes on RLS parameter estimation; they're full of algebra and go into depth into the derivation of RLS and the application of the Matrix Inversion Lemma, but none of them talk about any practical examples with real data. The closest I've found is this snippet from a Princeton lecture notes that discusses the scalar case of a recursively-defined calculation of the mean of an unknown parameter with additive noise, which is useful for understanding, but so trivial that I can't figure out how I might apply it to my $y = \hat{m}x+\hat{b}$ example in practice.

enter image description here


Let me give you a simple example that captures the basic idea.

Suppose that we want to find the average of $N$ numbers. Let me call it $A(N)$. Now $$ A(N) = \frac{x_1+x_2+\cdots X_N}{N}$$ Now imagine you have already calculated $A(N)$ and now receive a new data. What is the average of $N+1$ numbers?

$$ A(N+1) = \frac{x_1+x_2+\cdots X_N+X_{N+1}}{N+1}$$

The key is you do not have to calculate $A(N+1)$ from scratch. You can rewrite the above equation as $$ (N+1) A(N+1) = x_1+x_2+\cdots X_N+X_{N+1} \\ = \left(x_1+x_2+\cdots X_N\right)+X_{N+1}=N\, A(N)+X_{N+1}$$ Rearranging and simplifying you get $$ A(N+1)= A(N) + \frac{1}{N+1} \left(X_{N+1}-A(N)\right)$$ This is the recursive definition. It shows how to update the average with each new data value. We can write this as $$ A_{\text{new}} = A_{\text{old}} + K \left(A_\text{old} - \text{data}\right)$$

There are 2 important parts to the equation above. $K$ is called the gain. $\left(A_\text{old} - \text{data}\right)$ is called the innovation and is the difference between what you expect and what you get. Note $K$ will depend on how many samples you have already processed.

Now for recursive linear equations (I will write $y = a x + b$) you have the same structure $$ \pmatrix{a_\text{new} \\ b_\text{new} }=\pmatrix{a_\text{old} \\ b_\text{old} } + \pmatrix{K_{11} & K_{12}\\K_{21} & K_{22}} \left(y_\text{data} - (a_\text{old} x_\text{data} + b_\text{old})\right)$$

Added in response to comment

The formula for $K$ uses matrix inversion lemma which gives a recursive formula for $K$. The actual calculations are tedious and it will take me hours to type them here. Consult any good book. I wanted to give you the concepts. Actual details, as with any algorithm, is all algebra. Here is the procedure:

  • Write the formula for $N$ data points and the formula for $N+1$ data points.
  • In the formula for $N+1$ data points, replace all expressions involving the first $N$ data points by the formula for $N$ data points
  • Re-arrange and simplify
  • You will end up with an expression of the form $H^{-1}-(H+v v^T)^{-1}$ where $v$ is a vector.
  • Use matrix inversion lemma to get $H^{-1}-(H+v v^T)^{-1}=H^{-1}vv^TH^{-1}/(1+v^T H^{-1} v)$ (Actually it turns out that it is easier to write the recurrence relationship of $H^{-1}$).
  • Matrix gain $K$ can then be written in terms of $H$.

As with all such algorithms...it is details, details, details. Consult any good book.

RLS is a special case of BLUE (best linear unbiased estimate) which itself is a special case of Kalman filters. I chose to write the gains as $K$ in honor of Kalman who gave the recursive formula in a much broader context.

Good luck.


Details on the mathematics of this method can be found in Yang, Applied Numerical Methods using Matlab, pg 76. Here is also the RLS implementation;

import numpy as np

def rlse_online(aT_k1,b_k1,x,P): 
    K = np.dot(P,aT_k1.T)/(np.dot(np.dot(aT_k1,P),aT_k1.T)+1)
    x = x +K*(b_k1-np.dot(aT_k1,x))
    P = P-np.dot(K,np.dot(aT_k1,P))
    return x,K,P

n = 2
vals = np.array([[3.0,4.0,6.0,3.0,8.0,7.0,5.0]]).T
P = np.eye(n,n)*100.
x = np.zeros((n,1))
for k in range(len(vals)):
   x,K,P = rls.rlse_online(np.array([[k,1]]),vals[k,:],x,P)
print x