is nonlinear least square a non convex optimization?

linear least-squares are convex optimization.

Are nonlinear least squares also convex optimization? Can someone please give some simple examples?


Solution 1:

It depends. Once you are in the nonlinear world, things can be convex or nonconvex. You can write a generic nonlinear least-squares problem as $$ \min_{x \in \mathbb{R}^n} \ \tfrac{1}{2} \|F(x)\|^2, \qquad \text{where} \quad F(x) := (f_1(x), \ldots, f_m(x)), $$ and each $f_i : \mathbb{R}^n \to \mathbb{R}$. Let's assume that they all have continuous first and second derivatives. Now the gradient of $$ f(x) := \tfrac{1}{2} \|F(x)\|^2 = \tfrac{1}{2} \sum_{j=1}^m f_j(x)^2 $$ is $$ \nabla f(x) = \sum_{j=1}^m f_j(x) \nabla f_j(x) = J(x)^T F(x), $$ where $J(x)$ is the Jacobian of $F$, i.e., the $m$-by-$n$ matrix whose $j$-th row is $\nabla f_j(x)^T$: $$ J(x) = \begin{bmatrix} \nabla f_1(x)^T \\ \vdots \\ \nabla f_m(x)^T \end{bmatrix}. $$ Now let's compute the second derivatives of $f$ (its Hessian). It's easiest to use the expression of $\nabla f(x)$ as a sum (above) and differentiate that: $$ \nabla^2 f(x) = \sum_{j=1}^m f_j(x) \nabla^2 f_j(x) + \sum_{j=1}^m \nabla f_j(x) \nabla f_j(x)^T = \sum_{j=1}^m f_j(x) \nabla^2 f_j(x) + J(x)^T J(x). $$ The last term, $J(x)^T J(x)$ is always a positive semi-definite matrix. If the problem were a linear least-squares problem, all the individual Hessians $\nabla^2 f_j(x) = 0$ and $\nabla^2 f(x)$ would itself be positive semi-definite. In this case, $f$ is convex.

But if each $f_j$ is nonlinear, it could very well be that some or all the terms $f_j(x) \nabla^2 f_j(x)$ contribute against convexity.

Suppose for example that $m=1$ (i.e., there is only one term in all the sums) and that $f_1(x) = \sin(x)$. Then $f_1'(x) = \cos(x)$ and $f_1''(x) = -\sin(x)$. In this case, $f''(x) = -\sin^2(x) + \cos^2(x)$, which is not always positive (e.g., at $x=\pi/2$).

But on the other hand, suppose $m=1$ and $f_1(x) = -x^2$. Then $f''(x) = 6 x^2 \geq 0$. This one is convex.

From the expression of the Hessian above, you can see that if either

  1. each $f_j$ is nonnegative and convex, or
  2. each $f_j$ is nonpositive and concave,

then $f$ is convex. But you can't reverse this implication.