Scaling factor and weights in Unscented Transform (UKF)
I'm trying to implement the UKF for parameter estimation as described by Eric A. Wan and Rudolph van der Merwe in Chapter 7 of the Kalman Filtering and Neural Networks book: Free PDF
I am confused by the setting of $\lambda$ (used in the selection of sigma points and in the calculation of the weights for the mean). The authors recommend setting:
$\lambda = \alpha^2(L+k) - L$ where L is the dimension of the x. With alpha "small" $0 < \alpha < 1$, k either 0 or 3-L (different sources disagree on this). Sigma points are then calculated as a matrix $\chi$ with:
$\chi_{0} = x$
$\chi_{i} = x + \sqrt{ ((L+\lambda)*P_{x})_{i} }$ for i = 1....L
$\chi_{i} = x - \sqrt{ ((L+\lambda)*P_{x})_{i} }$ for i = L+1....2L
Where $(\sqrt{ (L+\lambda)*P_{x} })_{i}$ is the ith column of the square root of the covariance matrix of x.
Sigma points are ran through f:
$ Y_{i} = f(\chi_{i})$ i=0...2L
and the mean of Y is calculated as:
$ \bar{Y} = \sum{w_{i}Y_{i}}$
With the weights $w_{i}$ given as:
$$ w_{0} = \dfrac{\lambda}{L + \lambda} $$ $$ w_{i} = \dfrac{1}{2(L + \lambda)} $$
The issue I am running into is that for any reasonable values of L,$\alpha$ and k, $W_{0}$ ends up being negative (often very large negative values). While $W_{i}$ does sum to 1, the negative value results in the calculated mean being extremely far off. I'm sure there is something I am missing, but I can't figure out what.
Solution 1:
As far as I currently know, it is correct to get quite big negative values for $W_0$, as long the sum over ALL weights adds up to 1. According to http://youtu.be/DWDzmweTKsQ?t=24m27s, the equations are correct. Still, I also have a strange feeling about the huge negative weight. But the UKF seems to work so far ...
Wikipedia says:
It should be noted that Julier and Uhlmann published papers using a particular parameterized form of the Unscented Transform in the context of the UKF which used negative weights to capture assumed distribution information. That form of the UT is susceptible to a variety of numerical errors that the original formulations (above) do not suffer. Julier has subsequently described parameterized forms which do not use negative weights and also are not subject to those issues.
Do you have any additional info about this?
Actually, what surprises me more, is that the sum over all covariance weights is $\neq 1$ which seems wrong to me. Did you observe the same?
Solution 2:
Negative Weight is Fine (mostly)
The unscented transform as described by Julier et. al. in "A New Method for the Nonlinear Transformations of Means and Covariances in Filters and Estimators" IEEE Transactions on Automatic Control, Vol 45 No 3 March 2000 explicitly allows negative weights. See Remark 2: "k can be any number, (positive or negative) providing n + k != 0." Setting k negative causes the first weight to be negative. OP's incorrect mean is likely due to an implementation bug, not a consequence of the negative weight in itself. The downside to having this negative weight pertains to the covariance estimate rather than the mean estimate.
Consequence of Negative Weight
When k is negative, the resulting covariance estimate may not be positive semidefinite, i.e. it is not a true covariance matrix. How bad this is depends on your application. If you attempt to take Cholesky factorization, it could crash your program. The authors suggest a modification of the algorithm in Appendix III to address this problem, but OP's linked algorithm from Wan and van der Merwe has not incorporated this alteration.
Why Negative Weight?
The reason we wanted k to be negative (in particular k + n = 3) was so that we could have the sigma points approximately match the moments of a Gaussian random variable up to the fourth order (Appendix I), and so that the sigma points will not be too spread out (which would bring in errors from higher order terms).
I found Julier's newer paper, The Scaled Unscented Transform, from the Wikipedia citations. Julier develops the scaled unscented transform which allows approximation of a Gaussian without spreading out the sigma points too much without the use of negative weights. Then there is equivalent alternative form given which does use negative weights, but which, by equivalence to the original method, will always generate true covariance matrices (disregarding numerical error). In the link there are some typos in particular equation (15) is contradictory to (24) for $i = 0$. The correct version is the one in (15) which implies that the zeroth weight is negative for small $\alpha$. Again, this is fine because of Theorem 3 in the appendix which shows the equivalence to the true covariance matrix $P_{zz}^*$.
Solution 3:
I've been fighting with the same problem for some time now and I think that there is actually no real solution if you want to stick to the well-known algorithms. Here is what I found:
Adding to Mark's answer, in The Scaled Unscented Transform, Julier actually points out that (at least without $\beta$)
The predicted mean and covariance are accurate to the second order and $\mathbf{P'}_{yy}$ is guaranteed to be positive semidefinite if all of the untransformed weights are non-negative.
The untransformed weights in Wan and van der Merwe's paper correspond to Julier's original formulation in A New Extension of the Kalman Filter to Nonlinear Systems with $w_0 = \frac{k}{n+k}$ and $ w_i = \frac{1}{2(n+k)}$. The k + n = 3 rule will thus still not result in a guaranteed positive semidefinite covariance estimate if n is larger than 3. Actually, Wan and van der Merwe suggest setting k to 0, which results in an non-negative untransformed $w_0=0$ but still gives a negative transformed $w_0$.
So it seems that following the standard approaches, negative weights for the mean cannot be avoided if n is larger than 3. This is also a result in A Numerical-Integration Perspective on Gaussian Filters Section IV A, where they point out that having negative weights is indeed undesirable and can mess up the stability of the filter. So most probably, you are not missing something - the problem is the (scaled) UKF.
My solution so far has been to use the version in A New Extension of the Kalman Filter to Nonlinear Systems with a small, positive k. This however also somewhat suffers from high-dimensional states since the sigma points are placed farther away from the mean the higher n is. (Which I think was the reason for introducing the scaled unscented transform to begin with...)
Finally, if you feel adventurous, you can look into newer approaches for placing the sigma points. For example here the authors tried to learn the parameters $\alpha$,$\beta$ and k and ended up with very different and sometimes unconventional choices for different systems.