How to set up Lagrangian optimization with matrix constrains

Suppose we have a function $f: \mathbb{R} \to \mathbb{R} $ which we want to optimize subject to some constraint $g(x) \le c$ where $g:\mathbb{R} \to \mathbb{R} $ What we do is that we can set up a Lagrangian \begin{align*} L(x)=f(x)+\lambda(g(x)-c) \end{align*} and optimize.

My question is the following.

Now suppose we have a function $f: \mathbb{R}^n \to \mathbb{R} $ subject to $g(X) \le K $ but now $g:\mathbb{R}^n \to \mathbb{R}^n $. For example $g$ can be $XX^T \preceq K$ where $K$ is some matrix. How to set up Lagrangian for this case? Thank you for any help.

Solution 1:

Here's the TL;DR version, for your specific example. The Lagrangian is $$L(X,Z) = f(X) - \langle Z, K - XX^T \rangle$$ where the inner product is the simple elementwise inner product, and the Lagrange multiplier $Z$ is positive semidefinite.

A more general discussion: the Lagrangian looks like this: $$L(x,\lambda) = f(x) - \langle \lambda, c - g(x)\rangle$$ In order to fully define this quantity, you need to know:

the (primal) vector space that $c$, $g(x)$ etc. lie in
the dual vector space that $\lambda$ lies in
the inner product $\langle \cdot,\cdot \rangle$
the interpretation of the inequality $g(x)\leq c$, which determines the constraint on $\lambda$.

The primal vector space is usually evident from context. In the case of $XX^T\preceq K$, it would be the set of symmetric matrices of appropriate size.

The dual vector space and inner product are defined together. In most cases, you will likely use a simple elementwise inner product; e.g., for both symmetric and non-symmetric matrices, this is $$\langle A, B \rangle = \mathop{\textrm{Tr}}(A^TB) = \sum_{i,j} A_{ij} B_{ij}.$$ You need a real inner product; so for complex matrices, the inner product would be $$\langle A, B \rangle = \Re(\mathop{\textrm{Tr}}(A^HB)) = \sum_{i,j} \Re(\overline{A_{ij}} B_{ij})=\sum_{i,j} \Re(A_{ij})\Re(B_{ij})+\Im(A_{ij})\Im(B_{ij})$$ In most finite-dimensional cases with the default inner product, the primal space and dual space are the same. Exceptions would happen if you want to define a vector space with specific structures; e.g., the set of Toeplitz matrices. But usually you have a choice between selecting the inner product so that the space is its own dual, or using a simpler inner product and defining the dual space appropriately.

Finally, there is the issue of how the Lagrange multiplier is constrained. The answer depends on how you are defining the inequality $g(x)\leq c$. For instance, if you mean it in an elementwise sense $$g(x) \leq c \quad\Longleftrightarrow\quad [g(x)]_i \leq c_i, ~i=1,2,\dots, n$$ then $\lambda\in\mathbb{R}^n_+$; i.e., it is elementwise nonnegative. On the other hand, if you intend to define the inequality on the semidefinite cone $\mathcal{S}^n_+$ $$XX^T\preceq K \quad\Longrightarrow\quad K - XX^T \text{~positive semidefinite}$$ then $\lambda\in\mathcal{S}^n_+$; i.e., it is positive semidefinite.

In the most general case, we can define a generalized inequality $$g(x) \preceq_{\mathcal{K}} c \quad\Longleftrightarrow\quad c - g(x) \in \mathcal{K}$$ where $\mathcal{K}$ is a closed, convex, pointed cone with a non-empty interior (e.g., a proper cone). Generalized inequalities obtain many of the key properties of standard inequalities, including partial ordering, transitivity, positive scaling, and so forth. For generalized inequalities, the Lagrange multiplier must lie in the dual cone $\mathcal{K}^*$: $$\mathcal{K}^* = \left\{z~\middle|~\langle z, x \rangle \geq 0~\forall x\in\mathcal{K}\right\}$$ It can be shown that $\mathcal{K}^*$ is always a proper cone when $\mathcal{K}$ is proper. This definition reduces precisely to the two examples I gave above when $\mathcal{K}$ is the nonnnegative orthant or the semidefinite cone; in those cases, $\mathcal{K}^*=\mathcal{K}$.

Notice that the definition of the dual cone ensures that $$\langle \lambda, c - g(x) \rangle \geq 0$$ when $x$ is feasible and $\lambda$ is dual feasible. This is of course the behavior you want.

A function is convex if and only if its gradient is monotone.

Does the Fourier Transform exist for f(t) = 1/t?

Is irrational times rational always irrational? [duplicate]

Prove $\int_0^1\frac{\ln x\ln(1+x)}{1-x}\ dx=\zeta(3)-\frac32\ln2\zeta(2)$

Derivative of $f(x,y)$ with respect to another function of two variables $k(x,y)$

Relationships between AC, Ultrafilter Lemma/BPIT, Non-measurable sets

Why does $L^2$ convergence not imply almost sure convergence

Factorization of quartic polynomial.

Intersection of two deterministic finite automata?

Multiple root of a polynomial is also a root of the derivative

Del operator in Cylindrical coordinates (problem in partial differentiation)

If a separately continuous function $f : [0,1]^2 \to \mathbb{R}$ vanishes on a dense set, must it vanish on the whole set?