How to find $\min_{a \neq 0}\dfrac{a^TXX^Ta}{a^T\bar{X}\bar{X}^Ta}$ if $X$ is a matrix and $a$ a vector?

Suppose that $a$ is a $p \times 1$ fixed vector of real numbers, and that $X$ is a $p\times n$ matrix where $XX^T$ is a symmetric, positive definite matrix. Now, let

$$ \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i $$

where $X_i$ is the $i$th column of $X$.

I would like to find the following minimum:

$$ \min_{a \neq 0}\dfrac{a^TXX^Ta}{a^T\bar{X}\bar{X}^Ta} $$

One idea that comes to mind is to apply the Cauchy Schwarz Inequality but it only works for maximums. Does anyone have any ideas how to do the minimization? Thanks.


I call $v$ the vector obtained by summing the columns of $X$. Let $P= I - \frac{v v^T}{\|v\|^2}$ be the projection matrix such that $P v = 0$ and $P u =u$ for $\langle u,v \rangle = 0$.

Wlog you can assume $a = v-Pu$ which makes the denominator constant.

Thus, you need to minimize $$J(u) = (v-Pu)^TXX^T(v-Pu)$$ The gradient is $$\nabla J(u) = 2P^T XX^TPu-2P^TXX^Tv$$ and the solution is $$\nabla J(u) = 0 \implies u = (P^T XX^TP)^{+}P^TXX^Tv$$ where ${}^+$ is the pseudo-inverse


$ \def\bbR#1{{\mathbb R}^{#1}} \def\a{\gamma}\def\b{\beta}\def\l{\lambda}\def\m{\mu} \def\e{\varepsilon}\def\o{{\tt1}}\def\p{\partial} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use $\e_k$ to denote the standard vector basis for $\bbR{n}$, $\o\in\bbR{n}$ to denote the all-ones vector, and $J\in\bbR{n\times n}$ the all-ones matrix. Use these to extract the $k^{th}$ column of $X$ and thence the $\bar x\bar x^T$ matrix. $$\eqalign{ \bar x &= \frac{\o}{n}\sum_{k=\o}^n x_k \;=\; \frac{\o}{n}X \sum_{k=\o}^n \e_k \;=\; \frac{X\o}{n} \\ \bar x\bar x^T &= \frac{X \o\o^TX^T}{n^2} \;=\; \frac{XJX^T}{n^2} \\ }$$ For typing convenience, define the matrix variables $$G = XX^T \qquad H = \bar x\bar x^T$$ Now consider the following quadratic forms and their gradients with respect to $a$. $$\eqalign{ \a &= a^TGa \qiq \grad{\a}{a} = {2Ga} \\ \b &= a^THa \qiq \grad{\b}{a} = {2Ha} \\ }$$ Write the cost function as the ratio of these forms and calculate its gradient. $$\eqalign{ \m &= \b^{-\o}\a \\ \grad{\m}{a} &= \LR{\b^{-1}\grad{\a}{a} - \a\b^{-2}\,\grad{\b}{a}} \;=\; 2\b^{-\o}\BR{Ga-\m Ha} \\ }$$ Setting the gradient equal to zero yields a Generalized Eigenvalue Equation $$\eqalign{ Ga &= \m Ha \\ }$$ However, since $G$ is SPD, it can be inverted to yield a standard Eigenvalue Equation $$\eqalign{ \LR{G^{-\o}H}a &= \m^{-\o}a \\ Ma &= \l a \\ }$$ Therefore the extrema of $\m$ occur at the eigenvectors of the matrix $$M = {G^{-\o}H} \;=\; \frac{\o}{n^2}\LR{XX^T}^{-\o}\LR{XJX^T}$$

In particular, $\,\m_{min}=\l_{max}^{-\o}\,$ and occurs when $a$ equals the associated eigenvector.

NB: Since $H$ is a rank-$\o$ matrix so is $M$, therefore $\l_{max}$ is the only non-zero eigenvalue.