How to prove that $\arg\max_{t}\langle \sum w_i a_{t_i}, a_t\rangle$ is close to $t_1$?
Let $a_t$ be a Gaussian type function with center $t\in \mathbb R^d$, i.e. $$a_t(x)=e^{-||x-t||^2}, \text{ for } x \in \mathbb R^d.$$
Let $b$ be a combination of these Gaussian functions $$b = \sum_{i=1}^s w_i a_{t_i}$$ where $t_i \in \mathbb R^d$ for all $i=1,..., s$ and $w_1 > .... > w_s>0$ is a finite sequence of decreasing positive numbers.
Define $$\hat t = \arg\max_{t\in \mathbb R^d} \langle b, a_t\rangle.$$
Here the inner product $\langle b, a_t\rangle$ is defined as $\int_{x\in \mathbb R^d} b(x) a_t(x)dx$.
Intuitively, we can easily see that if the Gaussian functions are far away of each others, then $\hat t$ will be close to $t_1$, i.e. we have
Claim. If $\Delta= \min_{i\neq j; i, j\leq s}||t_i-t_j||\rightarrow +\infty$ then $\hat t \rightarrow t_1$.
I am wondering how to prove the claim rigorously. I have no idea since the function $\langle b, a_t\rangle$ is not even convex. Is there any idea to tackle this problem? Thanks!
Define $D$ as the following constant: $$ D= \int_{x \in \mathbb{R}^d} e^{-2||x||^2}dx$$ For the norm $||x||^2 = \sum_{i=1}^d x_i^2$ we have the following fact: For any $x, a, b \in \mathbb{R}^d$: $$ ||x-a||^2 + ||x-b||^2 = 2||x-\frac{(a+b)}{2}||^2 + \frac{1}{2}||a-b||^2$$ Define $g:\mathbb{R}^d\rightarrow\mathbb{R}$ by $$ g(t) = \int_{x \in \mathbb{R}^d} b(x) e^{-||x-t||^2}dx$$ Define $$g^* = \sup_{t \in \mathbb{R}^d} g(t)$$ Then for any $t \in \mathbb{R}^d$ we get \begin{align} g(t) &= \sum_{i=1}^s w_i \int_{x \in \mathbb{R}^d} e^{-||x-t_i||^2} e^{-||x-t||^2} dx\\ &\overset{(a)}{=}\sum_{i=1}^s w_i \int_{x \in \mathbb{R}^d}e^{-\left[2||x - \frac{(t+t_i)}{2}||^2 + \frac{1}{2}||t-t_i||^2 \right]} dx\\ &= \sum_{i=1}^s w_i e^{-\frac{1}{2}||t-t_i||^2}\int_{x \in \mathbb{R}^d} e^{-2||x-\frac{(t+t_i)}{2}||^2}dx\\ &= D\sum_{i=1}^s w_i e^{-\frac{1}{2}||t-t_i||^2} \end{align} where step (a) uses our fact about norms. In particular: $$g^*\geq g(t_1) > D w_1$$
For each $r>0$ define $A_i(r)$ as the closed ball of radius $r$ about $t_i$: $$ A_i(r) = \{x \in \mathbb{R}^d : ||x-t_i||\leq r\}$$
Fix $\epsilon>0$ and suppose $\epsilon$ is sufficiently small to ensure that $Dw_1>\epsilon$ and $Dw_1> Dw_2+\epsilon$. Fix $r>0$ big enough to ensure $Dsw_1e^{-r^2/2}\leq \epsilon$. Suppose the center points $t_1, ..., t_s$ are far enough apart so that the sets $A_1(r), A_2(r), ..., A_s(r)$ are disjoint.
-
Case 1: Suppose $t \notin \cup_{i=1}^d A_i(r)$. Then $||t-t_i||\geq r$ for all $i$ and so \begin{align} g(t) &\leq D \sum_{i=1}^s w_i e^{-\frac{r^2}{2}} \\ &\leq D s w_1 e^{-\frac{r^2}{2}} \\ &\leq \epsilon \\ &< Dw_1 \end{align}
-
Case 2: Suppose $t \in A_k(r)$ for some $k \in \{2, ..., s\}$. Then $||t-t_j||\geq r$ for all $j \neq k$ and so $$ g(t) \leq Dw_k + D(s-1)w_1e^{-\frac{r^2}{2}} \leq Dw_2 + \epsilon <Dw_1$$
It follows that if $t \notin A_1(r)$ then $g(t)<Dw_1<g^*$. So the supremum value $g^*$ can only be approached over points in the compact set $A_1(r)$. Since $g$ is continuous, it follows that the supremum can be achieved, and any maximizer point $t^*$ must satisfy $t^* \in A_1(r)$. Now for any point $t \in A_1(r)$ we have $||t-t_j||\geq r$ for all $j \in \{2, ..., s\}$ and so \begin{align} g(t) &\leq Dw_1 e^{-\frac{||t-t_1||^2}{2}} + Dsw_1e^{-r^2/2} \\ &\leq Dw_1 - Dw_1(1- e^{-\frac{||t-t_1||^2}{2}}) + \epsilon\\ &< g^*- Dw_1(1- e^{-\frac{||t-t_1||^2}{2}}) + \epsilon \end{align} In particular, if $t^*$ is a maximizer, then $t^* \in A_1(r)$ and so $$ g^* = g(t^*) < g^* - Dw_1(1- e^{-\frac{||t^*-t_1||^2}{2}}) + \epsilon$$ In particular $$ Dw_1 (1- e^{-\frac{||t^*-t_1||^2}{2}}) < \epsilon $$ Define $\delta = \frac{\epsilon}{Dw_1}$ and note by our assumptions that $0<\delta < 1$. Define $z = \frac{||t^*-t_1||^2}{2}$. It follows that $e^{-z} >1-\delta$, so \begin{align} z &< -\log(1-\delta) \\ &= \log(\frac{1}{1-\delta})\\ &= \log(1 + \frac{\delta}{1-\delta})\\ &\leq \frac{\delta}{1-\delta} \end{align} In particular, if $t^*$ is a maximizer then $$ \frac{||t^*-t_1||^2}{2} \leq \frac{\delta}{1-\delta} = \frac{\epsilon}{Dw_1 - \epsilon}$$
Michael proposed an amazing answer to the question. As you can see, his answer contains two parts. In this answer, I re-use his idea in the first part and reformulate the second part by 1) using tight inequalities and 2) using only parameter $r$ as a varying parameter. I also addressed the exponential convergence of $\hat t$ at the end of the proof.
Let $W = \sum_{i=1}^s w_i$ and $\Delta = \min_{i\neq j} ||t_i-t_j||>0$. Define $\color{blue}{r=\frac{\Delta}{2}}$.
Case 1. Let $t\in A_k(r)$ for some $k\in \{ 2,...,s\}$. The condition $r=\Delta/2$ implies that $||t-t_j||\geq r$ for all $j\neq k$ and $||t-t_k||\leq r$. Therefore \begin{equation} \begin{aligned} g(t) & \leq Dw_ke^{-\frac{||t-t_k||^2}{2}} + D(W-w_k)e^{-r^2/2}\\ & = Dw_k ( e^{-\frac{||t-t_k||^2}{2}} - e^{-r^2/2} ) + DW e^{-r^2/2}\\ & \leq Dw_2 ( e^{-\frac{||t-t_k||^2}{2}} - e^{-r^2/2} ) + DW e^{-r^2/2}\\ & \leq Dw_2 ( 1- e^{-r^2/2} ) + DWe^{-r^2/2}\\ & = Dw_2 + D(W-w_2)e^{-r^2/2}\\ & =: f(r). \end{aligned} \end{equation}
Case 2. For $t\in A=\mathbb R^d \setminus \bigcup_{i=1}^s A_i(r)$, we know that $||t-t_i||> r$ for all $i\in \{1,...,s\}$, \begin{equation} \begin{aligned} g(t) \leq DWe^{-r^2/2} \leq f(r). \end{aligned} \end{equation}
From Case 1, Case 2 and the fact that $\mathbb R^d\setminus A_1(r) \subset A \cup A_2(r) \cup ...\cup A_s(r)$, we have \begin{equation} \begin{aligned} \sup_{t\in \mathbb R^d \setminus A_1(r)} g(t) & \leq \sup_{t\in A \cup A_2(r) \cup ...\cup A_s(r)} g(t)\\ & \leq f(r)\\ & \color{blue}{<} f(r_1)\\ & = Dw_1 \end{aligned} \end{equation} Here, we further assume that $\color{blue}{r>r_1}$ where $r_1$ is the positive constant such that $f(r_1)=Dw_1$. Note that $r_1$ is well defined since $w_1>w_2$ and $D>0$.
Together with the fact that $g(\hat t) \geq g(t_1) \geq Dw_1$, we conclude that $\sup_{t\in \mathbb R^d} g(t)$ can achieve its maximum value at a point, say $\hat t$, belonging to a compact set $A_1(r)$.
Since $\hat t \in A_1(r)$. We know that \begin{equation} \begin{aligned} g(\hat t) & \leq Dw_1 e^{-\frac{||\hat t-t_1||^2}{2}} + D(W-w_1)e^{-r^2/2}\\ & = Dw_1 - Dw_1 ( 1- e^{-\frac{||\hat t-t_1||^2}{2}} ) + D(W-w_1)e^{-r^2/2}\\ & \leq g(\hat t) - Dw_1 ( 1- e^{-\frac{||\hat t-t_1||^2}{2}} ) + D(W-w_1)e^{-r^2/2}. \end{aligned} \end{equation} Remove $g(\hat t)$ in both sides, we have \begin{equation} \begin{aligned} Dw_1( 1- e^{-\frac{||\hat t-t_1||^2}{2}} ) \leq D(W-w_1)e^{-r^2/2}. \end{aligned} \end{equation} This inequality is equivalent to \begin{equation} \begin{aligned} - \log \left( 1- \frac{W-w_1}{w_1} e^{-r^2/2} \right)\geq \frac{||\hat t-t_1||^2}{2}. \end{aligned} \end{equation} To make sure that this inequality is well defined, we assume in addition that $\color{blue}{r\geq r_2}$ where $r_2$ is a positive constant such that $\frac{W-w_1}{w_1} e^{-r_2^2/2}\in [0, 1)$.
So, if $r=\Delta/2\rightarrow +\infty$, then it satisfies both conditions $r>r_1$ and $r\geq r_2$ (actually, we can show that $r_1>r_2$). As a consequence, LHS converges to $0$ and thus $\hat t \rightarrow t_1$. Furthermore, since $-\log (1-x)\sim x$, we conclude that $\hat t$ exponentially converges to $t_1$, i.e. $$||\hat t - t_1|| \sim C e^{-r^2/4},$$ where $C=\sqrt{ \frac{2(W-w_1)}{w_1}}$.
Hence, the Claim indeed holds true.