Strong convexity of Loss function in Logistic Regression

I wonder if the Loss function of a Logistic regression can have strong convexity when the explanatory variables are linearly independent. From a theoretical point of view, if I have a sample of p variables and n observations with the Logistic Regression. I just summarize the key results here. The Loss function for the case $Y \in \{-1,1\}$ (which is minus the Log-likelihood function)

$\displaystyle LL(\theta) = \sum_{i=1}^n \log \left( 1 + \exp(-Y_i \theta^T X_i ) \right)$

The Hessian is :

$\displaystyle \nabla^2LL(\theta) = X^TAX $ where A is a diagonal matrix with $0<A_{ii} \leq 1/4$.

We know that the Hessian is a Gram matrix, which is by definition positive semi-definite. With the assumption that $X$ has full rank, the Gram matrix is positive definite (All the eigenvalues > 0)

However, with these assumptions, can we go further and conclude that the Loss function is strongly convex when the explanatory variables matrix has full rank? If not, what do we miss here?

Solution 1:

To prove that your function is strongly convex with parameter $\sigma$, we must show that $X^TAX - \sigma I$ is positive semidefinite, which is equivalent to saying $$w^TX^TAXw \geq \sigma w^Tw$$ for any $w$. We can set the constraint $||w||=1$ and since $A$ is positive definite, we can set $\tilde{A}=\sqrt{A}$ to get $$w^T(\tilde{A}X)^T\tilde{A}Xw \geq \sigma,$$ which holds true for any $\sigma$ less than the smallest singular value of $(\tilde{A}X)$.

Solution 2:

With respect to @Yngve Moe's answer, $\sigma$ should be less than the smallest eigenvalue of $X^TAX$(which is the square of the smallest singular value of $\tilde{A}X$).

Let $\tilde{A}X=U\Sigma V^T$ its singular value decomposition. We want find $\sigma$ that for any $w \ne 0$,

$$w^TX^TAXw - \sigma w^Tw \ge 0$$

$$w^TV\Sigma^2V^Tw-\sigma w^TVV^Tw \ge 0$$

$$w^TV(\Sigma^2 - \sigma I_n)V^T w \ge 0$$

As $V$ is of full rank, then $V^T w$ is not $0$, so this inequality becomes true when the matrix $\Sigma^2 - \sigma I_n$ is positive semidenifite, which is equivalent to: $\sigma \le \text{the square of the smallest singular value of } \tilde{A}X$, this means: $\sigma \le $ the smallest eigenvalue of $X^TAX$.

To complete, if $X$ is not of full rank, then neither is $X^TAX$, then $X^TAX$ is not positive definite, thus its smallest eigenvalue is not positive, then the loss function is not ($\mu$-) strongly convex.

Why $c_1 ^{\omicron(\sigma)} = c_1 \implies c_1(\tau _1) = \tau_1 $ etc.

Show that there exists a regular function that is not a quotient of polynomials

Show that the Absolute Value Function Is not Rational

Convex Optimization Problem (example)

If $f \in \mathcal{R}[a, b]$ and $f(x) \geq 0$ almost everywhere, prove that $\int^b_a f\geq 0$

If $n \mid a^n - 1$, prove $ a + 1 $, $ a^2 + 2 $, ..., $ a^n + n $ are distinct $ \bmod n $.

About degree of a smooth map between Manifolds

when to use which z score equation?

If $A$ is a finitely generated $k$-algebra with no nontrivial maps $m/m^2\to k$ for any $m\subset A$, is $A\cong k^n$ as modules?

Isomorphic cokernels giving isomorphic kernels

Prove $T$ is a linear map on $V$, and the null space and range space are both finite, then vector space $V$ is finite.

How do you find the integral of $\int \sin^{5/2}x \cos^3x$