Concentration of measure bounds for multivariate Gaussian distributions (fixed)

Solution 1:

Note: The following is not an answer, but merely some thoughts which might or might not be helpful to you.

First note that you confused(?) your inequality signs. I think you want $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{n}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4} $$ and $$ \gamma_{n}\left(\left\{ x\in\mathbb{R}^{n}\,\mid\,\left\Vert x\right\Vert ^{2}\geq\frac{{\rm Trace}\left(\Sigma\right)}{1-\varepsilon}\right\} \right){\color{red}\leq}e^{-\varepsilon n/4}. $$ Also note that this inequality would get better with larger values of $n$. But in general, this is not true. To see this, use e.g. $$ \Sigma=\left(\begin{matrix}1\\ & 0\\ & & \ddots\\ & & & 0 \end{matrix}\right), $$ or if you want your $\Sigma$ to be positive semidefinite, use $\frac{1}{L\left(n-1\right)}$ instead of the zeros on the diagonal, where $L$ is large. Your estimate would then imply (since $\left\Vert x\right\Vert ^{2}\geq\left|x_{1}\right|^{2}$) that $$ \mathbb{P}\left(\left|x_{1}\right|^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq\mathbb{P}\left(\left\Vert x\right\Vert ^{2}\geq\frac{1+\frac{1}{L}}{1-\varepsilon}\right)\leq e^{-\varepsilon n/4}\xrightarrow[n\to\infty]{}0, $$ which is absurd.

Hence, the (exponent of the) right hand side of your estimate somehow needs to involve ${\rm trace}\left(\Sigma\right)$ instead of $n$ (I think).

What follows is an adaptation of the argument you linked, but I get eventually stuck when I try to optimize the/find a good value of $\lambda$.

First, since $\Sigma$ is symmetric positive semidefinite, there is an orthogonal matrix $O\in\mathbb{R}^{n\times n}$ with $\Sigma=O \cdot {\rm diag}\left(\lambda_{1},\dots,\lambda_{n}\right)\cdot O^{T}$, where $\lambda_{1},\dots,\lambda_{n}\geq0$ are the eigenvalues of $\Sigma$. We can now define the square root $\sqrt{\Sigma}:=O\cdot {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right) \cdot O^T\in\mathbb{R}^{n\times n}$ which satisfies $\sqrt{\Sigma}^{T}=\sqrt{\Sigma}$ and $\sqrt{\Sigma}\sqrt{\Sigma}=\Sigma$. Now, by well-known properties of the normal distribution, we conclude that $X:=\sqrt{\Sigma}g\sim N\left(0,\Sigma\right)$, where $g\sim N\left(0,{\rm id}\right)$ is a standard normal distributed random variable.

We also know that the standard normal distribution is invariant under orthogonal transformations, i.e. $h:=O^{T}g\sim N\left(0,{\rm id}\right)$. Finally, $$ \left\Vert X\right\Vert ^{2}=\left\Vert O{\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)O^{T}g\right\Vert ^{2}=\left\Vert {\rm diag}\left(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}\right)h\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}h_{i}^{2}, $$ so that $\left\Vert X\right\Vert ^{2}$ has (as you noted yourself) expectation $$ \mathbb{E}\left\Vert X\right\Vert ^{2}=\sum_{i=1}^{n}\lambda_{i}\mathbb{E}h_{i}^{2}=\sum_{i=1}^{n}\lambda_{i}={\rm trace}\left(\Sigma\right), $$ since $\mathbb{E}h_{i}^{2}={\rm Var}\left(h_{i}\right)=1$, since $h\sim N\left(0,{\rm id}\right)$.

By reordering, we can assume $\lambda_{1}\geq\dots\geq\lambda_{j}>0=\lambda_{j+1}=\dots=\lambda_{n}$, where $j\in\left\{ 0,\dots,n\right\} $.

Now observe that the Markov/Chebyscheff inequality yields, for arbitrary $\lambda>0$, \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & = & \mathbb{P}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\geq e^{\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\right)\\ & \leq & e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right), \end{eqnarray*} where \begin{eqnarray*} \mathbb{E}\left(e^{\lambda\left\Vert X\right\Vert ^{2}}\right) & = & \mathbb{E}\left(e^{\sum_{i=1}^{n}\lambda\lambda_{i}h_{i}^{2}}\right)\\ & = & \prod_{i=1}^{j}\mathbb{E}\left(e^{\lambda\lambda_{i}h_{i}^{2}}\right), \end{eqnarray*} by stochastic independence of $\left(h_{1},\dots,h_{n}\right)$. The main point of the introduction of the $e^{\dots}$ term is this final identity, where we can pull the product out of the expectation by independence.

Finally, \begin{eqnarray*} \mathbb{E}\left(e^{\gamma h_{i}^{2}}\right) & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{\gamma x^{2}}\cdot e^{-x^{2}/2}\,{\rm d}x\\ & = & \frac{1}{\sqrt{2\pi}}\cdot\int_{\mathbb{R}}e^{-\left(\sqrt{\frac{1}{2}-\gamma}x\right)^{2}}\,{\rm d}x\\ & \overset{\omega=\sqrt{\frac{1}{2}-\gamma}x}{=} & \frac{1}{\sqrt{2\pi}\cdot\sqrt{\frac{1}{2}-\gamma}}\cdot\int_{\mathbb{R}}e^{-\omega^{2}}\,{\rm d}\omega\\ & = & \frac{1}{\sqrt{1-2\gamma}} \end{eqnarray*} for $\gamma<\frac{1}{2}$.

All in all, we arrive at $$ \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right)\leq e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}. $$ The problem is now to optimize this w.r.t. $0<\lambda<\frac{1}{2\lambda_{1}}$. One way to simplify(?) this is to use $$ e^{-\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)}\cdot\prod_{i=1}^{j}\frac{1}{\sqrt{1-2\lambda\lambda_{i}}}=e^{-\left[\lambda\left({\rm trace}\left(\Sigma\right)+\delta\right)-\frac{1}{2}\sum_{i=1}^{j}\ln\left(1-2\lambda\lambda_{i}\right)\right]}, $$ where one only has to optimize the exponent. Still, I neither see an easy way to determine the optimal value of $\lambda$, nor a really convenient choice of $\lambda$.

One choice inspired by your linked lecture notes is to use $\lambda=\frac{\delta/2}{{\rm trace}\left(\Sigma\right)+\delta}$ (because in the standard gaussian case, we have $n={\rm trace}\left(\Sigma\right)$, which is exactly the choice used in the lecture notes). This would yield \begin{eqnarray*} \mathbb{P}\left(\left\Vert X\right\Vert ^{2}\geq{\rm trace}\left(\Sigma\right)+\delta\right) & \leq & e^{-\delta/2}\cdot\prod_{i=1}^{j}\sqrt{\frac{{\rm trace}\left(\Sigma\right)+\delta}{{\rm trace}\left(\Sigma\right)+\delta-\delta\lambda_{i}}}, \end{eqnarray*} which still does not really seem that great.

I will try to find a good choice of $\lambda$ here. If I come up with something, I will edit the post.

Concentration of measure bounds for multivariate Gaussian distributions (fixed)

Solution 1:

Related

Recent Posts