Derivation of Gaussian Newton

The question is simply about the derivation of Gaussian-Newton in solving a non-linear optimization problems, specifically in CS. The object function is simply $\parallel f(x) \parallel_2^2$. To know how the incremental $\Delta x$ is determined, it firstly transformed via the Taylor expansion $$ f(x + \Delta x) = f(x) + f'(x)\Delta x + \frac{1}{2!}f''(x){\Delta x}^2 + \cdots$$ (all these $x$ are vectors) By taking the 1st order of this expansion into the object function, we get $$\parallel f(x) + \mathbf{J}(x)\Delta x \parallel_2^2$$ Then we get this equation expanded to be like $$\begin{aligned} & (f+\mathbf{J} \Delta)^\mathrm{T} (f+\mathbf{J} \Delta) = f^\mathrm{T}f + f^\mathrm{T} \mathbf{J}\Delta + \Delta^\mathrm{T} \mathbf{J}^\mathrm{T}f+\Delta^\mathrm{T} \mathbf{J}^\mathrm{T}\mathbf{J}\Delta \\ \end{aligned}$$ Then, we calculate the derivative of $\Delta$ to the one above and get $$f^\mathrm{T} \mathbf{J}+ \mathbf{J}^\mathrm{T}f+\mathbf{J}^\mathrm{T}\mathbf{J}\Delta+\Delta^\mathrm{T} \mathbf{J}^\mathrm{T}\mathbf{J}$$ This is what I am not sure, because I cannot get the result that $\mathbf{J}f + \mathbf{J}\mathbf{J}^\mathrm{T}\Delta$. Maybe I remember wrong rules of matrix calculation.


Solution 1:

Your expansion is not written correctly (you have to use the Taylor series in more dimensions): \begin{align} f({\bf x}+{\bf h})=f({\bf x})+{\bf J}^{\text T}{\bf h}+o({\bf h}) \end{align} I denote ${\bf h}$ instead of $\triangle$, because ${\bf h}$ is a vector. Also $f$ is a scalar function, therefore $f^{\text{T}}=f$. Also note that ${\bf J}^{\text{T}}{\bf h}={\bf{h}}^{\text{T}}{\bf J}$. You have to evaluate one half of the quadrat: \begin{align} \frac{1}{2}(f+{\bf J}^{\text{T}}{\bf h})(f+{\bf J}^{\text{T}}{\bf h})=\frac{f^2}{2}+f\underbrace{{\bf J}^{\text{T}}{\bf h}}_{{\bf{h}}^{\text{T}}{\bf J}}+\frac{1}{2}{\bf{h}}^{\text{T}}{\bf J}{\bf J}^{\text{T}}{\bf h} \end{align} Derivative of the second and third term can be obtained form a general definition of the first derivative \begin{align} a({\bf h}+\text{d}{\bf h})=a({\bf h})+(\text{d}{\bf h})^{\text{T}}\frac{\partial a}{\partial {\bf h}}+o(\text{d}{\bf h}) \end{align} The second term gives ${\bf h}^{\text{T}}f{\bf J}$ i.e. derivative $f{\bf J}$, the last term gives derivative ${\bf JJ}^{\text{T}}{\bf h}$, which is seen from \begin{align} \frac{1}{2}({\bf{h}}+{\text{d}\bf{h}})^{\text{T}}{\bf{JJ}}^{\text{T}}({\bf{h}}+{\text{d}\bf{h}})=_{\text{linear terms only}} \frac{1}{2}\left[({\text{d}\bf{h}})^{\text{T}}{\bf{JJ}}^{\text{T}}{\bf h} + {\bf h}^{\text{T}}{\bf{JJ}}^{\text{T}}({\text{d}\bf{h}})\right]={\bf h}^{\text{T}}{\bf{JJ}}^{\text{T}}({\text{d}\bf{h}})=({\text{d}\bf{h}})^{\text{T}}{\bf{JJ}}^{\text{T}}{\bf h} \end{align} where the last equality is due to symmetry of ${\bf JJ}^{\text{T}}$. All together: \begin{align} f{\bf J}+{\bf JJ}^{\text{T}}{\bf h}. \end{align}

Solution 2:

Gauss-Newton method is applied to minimize objective function with the form $\phi(\mathbf{x}) = \| \mathbf{r}(\mathbf{x}) \|^2$.

At $k$-th iterate, GN approximates the term using Taylor expansion $$ \mathbf{r}_{GN}(\mathbf{x}) = \mathbf{r}_k + \mathbf{J}_k \left( \mathbf{x}-\mathbf{x}_k \right) \simeq \mathbf{r}(\mathbf{x}) $$ with the notations $\mathbf{r}_k = \mathbf{r}(\mathbf{x}_k)$ and the Jacobian $\mathbf{J}_k = \mathbf{J}(\mathbf{x}_k)$.

The differential writes now $ d\phi_{GN} = 2\mathbf{r}_{GN}(\mathbf{x}): \mathbf{J}_k d\mathbf{x} $ and thus the derivative is $$ \frac{\partial \phi_{GN}}{\partial \mathbf{x}} = 2\mathbf{J}_k^T \mathbf{r}_{GN}(\mathbf{x}) $$

Setting the derivative to zero yields the GN update \begin{eqnarray*} \mathbf{x}_{k+1} &=& \mathbf{x}_k - \left( \mathbf{J}_k^T \mathbf{J}_k \right)^{-1} \mathbf{J}_k^T \mathbf{r}_k \end{eqnarray*}