Least-squares solution to system of equations of $4 \times 4$ matrices with $2$ unknown matrices

This question is in the context of a robotics problem. The goal is to track a robot using both its onboard odometry system and a VR system (HTC Vive Pro) using a VR controller mounted to the robot.

What is known is the transformation between odometry origin and the robot (measurements $A_n$) and between the VR origin and the controller (measurements $C_n$). Ignoring both systems' inaccuracies for now, we can also assume the robot's pose according to both systems is identical ($I$). Driving around will result in many pairs of measurements ($A_n$, $C_n$).

What is unknown is the fixed transformation $B$ between the two coordinate systems and the fixed transformation $D$ between the robot's origin and the VR controller. All transformations involved are proper rigid.

The chain of transformation looks as follows.

Transformation graph for a robot tracked by both a VR system and wheel odometry

This leaves us with an equation system of $4 \times 4$ homogeneous transformation matrices

$$A_n \cdot B \cdot C_n \cdot D = I$$

with $I$ being the $4 \times 4$ identity matrix and $(A_n, C_n)$ being (many) pairs of measurements.

I am looking for the optimal (least-squares, I suppose) solution for $B$ and $D$, so that the equation holds approximately true for all pairs of measurements.


Solution 1:

Too long for a note.

Follows a MATHEMATICA script to solve numerically this problem according to a nonlinear least squares paradigm. The optimization procedure can be implemented with a sequential quadratic programming approach.

Clear["Global`*"]
mB = EulerMatrix[{b[1], b[2], b[3]}]
tB = Array[tb, {3, 1}]
mD = EulerMatrix[{d[1], d[2], d[3]}];
tD = Array[td, {3, 1}]
hB = Join[mB, {{0, 0, 0}}]
hD = Join[mD, {{0, 0, 0}}]
HB = Transpose[Join[Transpose[hB],Transpose[Join[tB, {{1}}]]]]
HD = Transpose[Join[Transpose[hD],Transpose[Join[tD, {{1}}]]]]

n = 10;
SeedRandom[2]
rA = Table[RandomReal[{-1, 1}, {4, 4}],{n}];
rC = Table[RandomReal[{-1, 1}, {4, 4}],{n}];
obj = Total[Flatten[Sum[(rA[[k]].HB.rC[[k]].HD - IdentityMatrix[4])^2, {k, 1, n}]]];
vars = {b[1], b[2], b[3], tb[1, 1], tb[2, 1], tb[3, 1], d[1], d[2], d[3], td[1, 1], td[2, 1], td[3, 1]}

sol = NMinimize[obj, vars]

Solution 2:

I'll start with explaining how I pose the problem to see if we agree on that (there is no point in discussing algorithms until we have agreed on what they should achieve).

I will define the abstract "navigational measurement" $\mathcal A$ as the pair $a\in\mathbb R^3$ and a proper (determinant 1) rotation $A\in SO(3)$. Note that this pair can be understood in two different ways, i.e., as the transformation $x\mapsto a+Ax$ (which is the one I'll consider and the one to which all my formulas will be adjusted) or as the transformation $x\mapsto A(a+x)$ (which would need adjustments in everything I say). I will say that the navigational system is functioning satisfactorily if the measurement $a,A$ it makes deviates from the truth $\bar a,\bar A$ by at most $\varepsilon_A$ in $A$ ($\|A-\bar A\|\le\varepsilon_A$ and at most $\varepsilon_a$ in $a$ ($|a-\bar a|<\varepsilon_a$).

The first important thing to understand is that if you switch to the $x\mapsto A(a'+x)$ representation with truth $\bar a'=\bar A^*\bar a$ (for orthogonal matrices inverse is the same as transpose), then the satisfactorily functioning navigational system in my sense will produce the error in $a'$ of order $|\bar a|\varepsilon_A+\varepsilon_a$, so the measurement of $a'$ will be allowed to have huge error when $\bar a$ is large and the other way around. So, you should be really sure which absolute error (in $a$ or $a'$) you want to control. If you confuse them, your program will just not be doing the right thing. The right choice here depends on the physical setup, so you are the only person who can tell it. The rule is pretty simple: you should just write the conversion from $a,A$ to the coordinate vector in some stationary coordinate system and you want to see knowing which of the vectors $a$ or $a'$ allows you to know that coordinate vector independently of $A$. Note that you may end up with different answers for the robot and the control, in which case all I'm saying will have to be modified appropriately.

As I said, I'll use $x\mapsto a+Ax$ for the direct mapping (and, correspondingly, the other choice for the inverse). Thus, the dimensionless quadratic deviation of the measurement $\mathcal A$ from the truth $\bar{\mathcal A}$ is $\varepsilon_A^{-2}\|A-\bar A\|^2+\varepsilon_a^{-2}|a-\bar a|^2$ and we assume that the mean of this quantity is bounded by $1$.

It is a bit awkward to carry two different $\varepsilon$'s around, so I will just change the unit of length to make them equal (i.e., if $\varepsilon_A=0.01$ and $\varepsilon_a=0.1m$, then my standard unit of length will become $10$ meters instead of $1$ meter). Then I'll just have $$ \Delta(\mathcal A,\bar{\mathcal A})=\|A-\bar A\|^2+\|a-\bar a|^2\le \varepsilon^2 $$ in the mean.

Now suppose that we have two satisfactorily functioning navigational systems. Suppose that they can be reconciled by your loop by the transformations $\bar{\mathcal B}$ and $\bar{\mathcal D}$ in the sense that for the true readings $\bar{\mathcal A}_k$, $\bar{\mathcal C}_k$ we have $\bar{\mathcal A_k}\bar{\mathcal B}\bar{\mathcal C}_k\bar{\mathcal D}=1$ all the time. However, we don't know the truth, only the approximate measurements ${\mathcal A}_k$ and ${\mathcal C}_k$ and if we plug them into this equation even with correct $\mathcal B=\bar{\mathcal B}$ and $\mathcal D=\bar{\mathcal D}$, then our error $\varepsilon$ will be amplified by the length of the shifts, so an attempt to minimize this sum of squares will be quite a bad idea.

The correct formulation, in my opinion, is instead the following: can we find $\mathcal B$ and $\mathcal D$ and "plausible true measurements" $\mathcal A'_k, \mathcal C'_k$ such that $\mathcal A'_k\mathcal B\mathcal C'_k\mathcal D=I$ exactly for every $k$ and $$ \Delta(\mathcal A_k,\mathcal A_k')+\Delta(\mathcal C_k,\mathcal C_k')\le\varepsilon^2 $$ in the mean? If we can, we will just say that the navigational systems can be reconciled. If not, we will conclude that (at least) one of them is malfunctioning and reset them or stop the robot operation.

Thus, mathematically we do, indeed, have the least square problem, but it is not what was originally proposed!

I will prefer to write $\mathcal B$ and $\mathcal D$ in the second form: $x\mapsto B(b+x)$ and $x\mapsto D(d+x)$ for some reasons that will become clear later. Note that here we do not want to find something close to the truth, just something that will explain all observations not worse than the truth if the truth exists (and, therefore, will work too as long as we do not drive far beyond the observed range). I'll come to this point again later.

Now let us see what it is. We have two sets of equations that should hold exactly: $A_k'BC_k'D=I$ and $a_k'+A'_kB(b+c'_k)+A'_kBC'_kDd=0$. Using the first set of equations, we can rewrite the second one (multiplying by $(A'_kB)^*=C'_kD$ as $$ C_k'D(a'_k+d)+(b+c'_k)=0\,. $$ Let now $V_k=(C_k')^*$ then $A_k'$ will be uniquely determined from the first set once we know $V_k,B,D$ and we have $\|V_k^*-C_k\|^2+\|A_k'-A_k\|^2$ comparable to $\|A_kBC_kD-I\|^2+\|V_k^*-C_k\|^2$ up to a factor of 4, so it is just an equivalent deviation measurement. Thus it will be convenient to forget about $A_k'$ altogether and declare that we want to make $\|A_kBC_kD-I\|^2+\|V_k^*-C_k\|^2$ small.

Now comes the second part. It is $V_k^*D(a_k'+d)+(b+c'_k)=0$. Note that we can freely shift $c_k'$ by $\varepsilon$ from $c_k$, so we can be in good shape if we find $a_k',V_k,D,b,d$ so that $$ |V_k^*D(a_k'+d)+(b+c_k)|<\varepsilon $$ or, which is the same, $$ |(a_k'+d)+D^*V_k(b+c_k)|<\varepsilon. $$ Now we see that we can adjust $a_k$ to $a_k'$ by at most $\varepsilon$ too as long as $$ |(a_k+d)+D^*V_k(b+c_k)|<\varepsilon $$

Thus (up to a close to $1$ constant factor) we can say that all discrepancies can be explained by observation errors if and only if $$ \|A_kBC_kD-I\|^2+\|V_k^*-C_k\|^2+|(a_k+d)+D^*V_k(b+c_k)|^2\le\varepsilon^2 $$ in the mean for some choice of $B,D,b,d,V_k$.

This is exactly the quadratic minimization problem that we will be interested in. Notice that it has many nice features. First of all, we can shift all $a_k$ by some vector $u$ and shift $d$ by $-u$, which will change nothing. Thus we may assume that the mean of $a_k$ is zero. This shift is irrelevant for the theory, but does help with numerics, so I advise doing it before running the algorithm (and similarly for $c_k$ and $d$).

Second, in each coordinate separately ($V_k,B,D,(b,d)$) with other ones fixed, it is a nice minimization problem. In $(b,d)$ it is the classical linear least squares, but in $V_k$, $B$, $D$ it is also a tractable minimization problem on the proper orthogonal group of the same kind for each of them. Let's say, we want to minimize in $B$ with other variables fixed. Then we can write the $B$-related part as $$ \frac 1n\sum_{k=1}^n \|A_kBC_kD-I\|^2=2\frac 1n\sum_{k=1}^n(1-\rm{Tr}\,(A_kBC_kD)) \\ = 2-2\rm{Tr}\,([\frac 1n\sum_{k=1}^n C_kDA_k]B)\,, $$ i.e., we just need to figure out how to maximize the trace $\rm{Tr}\,XB$ for an arbitrary given 3 by 3 matrix $X$ over proper rotations $B$. The other two separate optimizations (for $V_k$ and $D$) turn out to be of the same kind.

So suppose that we have an oracle that, given $X$, tells us the corresponding $B$ (I'll tell you eventually how that oracle really works, but I don't want to deprive you of the pleasure of figuring it out by yourself before it is time). Then the natural idea would be to optimize $V_k$, then $(b,d)$, then $B$ (note that the last 2 can be done in any order), then $D$, and then to cycle. That is what I tried first. It didn't work! One reason was that the movement was frustratingly slow and the other was that that natural descent terminated at some local minimum instead of the global one about 70% of the time. So I did something else instead in which this cycling was only a part of the process.

I'll talk later about what exactly was done if you or somebody else still cares, but now I want to talk a bit about 3 kinds of robots for which I believe any proposed algorithm should work and on which it should be tested.

The first one (free robot) can rotate in all directions (so $A_k$ and $C_k$ can be arbitrary elements on $SO(3)$). This is often not so because even for the flying drone the rotation about the vertical axis is arbitrary, but the bank is limited to small angles. The second one (axis robot) is like all freely moving earth machinery where the only really admissible rotation is around the vertical axis. Finally the third one is rigid (allowing translations only or with very limited rotation angles). The important thing to understand is that in the second and in the third case the solution is not unique even if all data are exact.

What happens in the second case can be understood when $A_k$ and $C_k$ are in some commutative subgroup of rotations $\Omega$. Then we can take any element $\omega$ of $\Omega$ and replace $B,D$ by $\omega B$ and $D\omega$. That would not change $ABCD$ at all. Now the equation $CD(a+d)+(b+c)$ may or may not help to determine $\omega$ depending on whether all $a_k=b_k=0$ or not. Normally you would have some shifts, but if you don't, then you can just change $d$ to $\omega^*d$. At any rate, notice that the rotation axis vector $v$ is here an eigenvector with eigenvalue one for the entire group $\Omega$ and then you can always change $d$ to $d+tv$ and $b$ to $b-tv$. The situation with the rigid robot allows even more freedom (it can be understood by looking at the case when all $A=C=I$). This freedom shows that trying to find the "true" $B,D,b,d$ with high precision may be either impossible or difficult if the motion range is restricted or severely limited. However, as I said, this goal is not what is really needed: what is needed is just the reconciliation motions that explain all the data and then they will be just as good as the truth for all practical purposes as long as we don't extend the operating range.

OK, long enough for today. So let me ask you if you agree with the proposed setting. If yes, I'll continue and we'll discuss the algorithm itself (it just passed the 5000 simulations benchmark, so it is apparently half way up to the Japanese standards of performance where the magic number is $10,000$ :-) ).