I have the following convex optimization problem over $\mathbb{R}^n$ with box constraints:

$$\begin{align}\text{minimize }&\;f(x)\\ \text{subject to }&\;x \in [-1,1]^n\end{align}$$

I can see two approaches to handle this:

Approach 1. Use a dedicated convex optimization method that handles box constraints (e.g., projected gradient descent; L-BFGS-B, i.e., L-BFGS with box constraints).

Approach 2. Apply a nonlinear transform to force $x$ to be in the box $[-1,1]^n$. In particular, define

$$g(w) = f(\varphi(w))$$

where $\varphi: \mathbb{R}^n \to \mathbb{R}^n$ applies the hyperbolic tangent coordinate-wise, i.e., $\varphi(w)_i = \tanh w_i$. Then, solve the unconstrained optimization problem

$$\text{minimize }\;g(w)$$

where now we allow $w$ to range over all of $\mathbb{R}^n$. Note that the solution $w$ to this unconstrained problem immediately yields a solution to the original problem, by taking $x = \varphi(w)$. We can then use any optimization procedure to solve the unconstrained problem -- we're not limited to ones that explicitly support box constraints. Basically, I'm just applying a substitution or change-of-variables to force the box constraints to always hold.

My question. Is there any reason to prefer one approach over the other? Should we expect one to converge significantly faster, or be more robust, or something?

In the readings I've done so far, I've seen lots of references to Approach 1, but I've never seen anyone mention Approach 2. Are there any pitfalls with Approach 2 that I should be aware of?

I have a convex optimization solver that seems to work well in my domain, but doesn't support box constraints. So, if there are no pitfalls or shortcomings of Approach 2, it would be convenient to apply the solver I already have to solve the unconstrained optimization problem. (The existing solver already has many well-tuned heuristics; it would be painful to implement projected gradient descent or L-BFGS from scratch and implement all of those heuristics.) However, I want to find out if there are some pitfalls I might not be aware of with Approach 2.

Approach 2 seems vaguely reminiscent of barrier methods (or interior point methods), though it's not the same. Does the theory in that area offer any relevant insights into this question?


A solution to your original problem probably has some components equal to $\pm 1$, but hyperbolic tangent is never equal to $\pm 1$. Thus, the reformulated problem probably has no minimizer. Also, $g $ is probably not convex, so convexity has been lost.

You mentioned the projected gradient method. You might also consider FISTA or the TFOCS software package.


Even if the question is outdated, let me suggest some other approaches in case someone happens to have a similar problem and finds its way here.

The Projected Newton method has been very useful to me when you need to minimize a quadratic objective under box constraints. Especially so if the Hessian matrix turns to be sparse or has some structure that allows fast inverse calculation.

Alternatively if you problem is not quadratic but the gradient is cheap to compute I would suggest the already mentioned Projected Gradient, or the Frank-Wolfe method. Usually the tricky part of this method involves solving a subproblem where you minimize a linear function subject to your problem constraints. Fortunately, having just box constraints makes this very easy!