Strong convexity of Loss function in Logistic Regression
I wonder if the Loss function of a Logistic regression can have strong convexity when the explanatory variables are linearly independent. From a theoretical point of view, if I have a sample of p variables and n observations with the Logistic Regression. I just summarize the key results here. The Loss function for the case $Y \in \{-1,1\}$ (which is minus the Log-likelihood function)
$\displaystyle LL(\theta) = \sum_{i=1}^n \log \left( 1 + \exp(-Y_i \theta^T X_i ) \right)$
The Hessian is :
$\displaystyle \nabla^2LL(\theta) = X^TAX $ where A is a diagonal matrix with $0<A_{ii} \leq 1/4$.
We know that the Hessian is a Gram matrix, which is by definition positive semi-definite. With the assumption that $X$ has full rank, the Gram matrix is positive definite (All the eigenvalues > 0)
However, with these assumptions, can we go further and conclude that the Loss function is strongly convex when the explanatory variables matrix has full rank? If not, what do we miss here?
Solution 1:
To prove that your function is strongly convex with parameter $\sigma$, we must show that $X^TAX - \sigma I$ is positive semidefinite, which is equivalent to saying $$w^TX^TAXw \geq \sigma w^Tw$$ for any $w$. We can set the constraint $||w||=1$ and since $A$ is positive definite, we can set $\tilde{A}=\sqrt{A}$ to get $$w^T(\tilde{A}X)^T\tilde{A}Xw \geq \sigma,$$ which holds true for any $\sigma$ less than the smallest singular value of $(\tilde{A}X)$.
Solution 2:
With respect to @Yngve Moe's answer, $\sigma$ should be less than the smallest eigenvalue of $X^TAX$(which is the square of the smallest singular value of $\tilde{A}X$).
Let $\tilde{A}X=U\Sigma V^T$ its singular value decomposition. We want find $\sigma$ that for any $w \ne 0$,
$$w^TX^TAXw - \sigma w^Tw \ge 0$$
$$w^TV\Sigma^2V^Tw-\sigma w^TVV^Tw \ge 0$$
$$w^TV(\Sigma^2 - \sigma I_n)V^T w \ge 0$$
As $V$ is of full rank, then $V^T w$ is not $0$, so this inequality becomes true when the matrix $\Sigma^2 - \sigma I_n$ is positive semidenifite, which is equivalent to: $\sigma \le \text{the square of the smallest singular value of } \tilde{A}X$, this means: $\sigma \le $ the smallest eigenvalue of $X^TAX$.
To complete, if $X$ is not of full rank, then neither is $X^TAX$, then $X^TAX$ is not positive definite, thus its smallest eigenvalue is not positive, then the loss function is not ($\mu$-) strongly convex.