How small the probability $\Bbb{P}(X_1+X_2 +\dots +X_n < n + 1)$ can be if $\Bbb{E}(X_i) = 1$?

Let $X_1, X_2, \dots, X_n$ be $n$ nonnegative independent identically distributed random variables with the same expectation: $$\forall 1 \le i \le n: \quad \Bbb{E}(X_i) = 1$$

How small the probability $\Bbb{P}(X_1+X_2 +\dots +X_n < n + 1)$ can be?

Ideally, for the fixed $n$, we need to find an infimum of this probability over all possible distributions of $X_i$ and then see if the lower bound can ever be achieved.

The natural idea is to rewrite the probability in question in the following form: $$\Bbb{P}\left(\dfrac{X_1+X_2 +\dots +X_n}{n} < 1 + \dfrac{1}{n}\right)$$ Here appears the average of $X_1, X_2, \dots, X_n$, but I'm not sure what can be done next.

I'm also interested in solving the generalized version of the problem for random variables that are not necessarily identically distributed.

Any suggestions, hints on how to approach this problem, and useful comments would be greatly appreciated.


Define $c_n$ as the infimum value of $P[X_1 + ... + X_n<n+1]$ over all distributions for i.i.d. nonnegative random variables $\{X_i\}$ with $E[X_i]=1$. Here I prove the simple upper and lower bounds: $$ \frac{1}{n+1} \leq c_n \leq (1-\frac{1}{n+1})^n \quad, \forall n \in \{1, 2, 3, ...\} $$ Notice that the upper and lower bounds meet when $n=1$, so $c_1=1/2$.

Achievability (upper bound):

Consider the nonnegative random variables $$ X_i = \left\{ \begin{array}{ll} n+1 &\mbox{ , with prob $\frac{1}{n+1}$} \\ 0 & \mbox{ , with prob $1-\frac{1}{n+1}$} \end{array} \right.$$ These have $E[X_i]=1$ and: $$P[X_1 + ... + X_n<n+1] = P[\mbox{all $X_i$ are zero}] = (1-\frac{1}{n+1})^n$$ Hence, $c_n \leq (1-\frac{1}{n+1})^n$.

Lower bound:

Let $\{X_i\}$ be any (possibly dependent and non-identically distributed) nonnegative random variables with $E[X_i]=1$. By the Markov inequality: $$ P[X_1 + ... + X_n\geq n+1] \leq \frac{E[X_1+...+X_n]}{n+1} = \frac{n}{n+1}$$ and hence $P[X_1 + ... + X_n < n+1] \geq \frac{1}{n+1}$. Hence, $c_n \geq \frac{1}{n+1}$.


EDIT: Michael's upper bound is locally optimal, not globally as I had originally stated. Specifically, there is some neighborhood $N$ of the distribution $\alpha:=\frac{n}{n+1}\delta_0+\frac1{n+1}\delta_{n+1}$ (in the weak topology) such that $\mathbb P(X_1+\ldots+X_n<n+1)<\mathbb P(Y_1+\ldots+Y_n<n+1)$ whenever $(X_i),(Y_i)$ are iid with $X_1\sim\alpha$ and $Y_1\sim\mu\in N\setminus\{\alpha\}$.

To see this, let $S$ be the set of probability measures $\mu$ on $[0,\infty)$ such that $\int_0^\infty x\,\mu(dx)=1$, and for $\mu\in S$ define

$$E_n[\mu]:=\int_{\sum_{i=1}^nx_i<n+1}\mu(dx_1)\ldots\mu(dx_n).$$

Note that if $(X_i)$ are iid nonnegative random variables with $\mathbb E[X_1]=1$, and $\mu$ is the law of $X_1$, then $E_n[\mu]=\mathbb P(X_1+\ldots+X_n<n+1)$. Let $\alpha=\frac{n}{n+1}\delta_0+\frac1{n+1}\delta_{n+1}$, i.e. $\alpha$ is the distribution of the random variable Michael defines for the upper bound.

Observe that $S$ is convex, so given arbitary $\mu\in S\setminus\{\alpha\}$ the function $\Phi_n(t):=E_n[(1-t)\alpha+t\mu]$ is well-defined for $t\in[0,1]$. The formula for $\Phi_n(t)$ is complicated, but we do not need much of it:

$$\Phi_n(t)=c_0+\left(\sum_{i=1}^n\int_{\sum_jx_j<n+1}\mu(dx_i)\prod_{k\neq i}\alpha(dx_k)-\int_{\sum_jx_j<n+1}n\prod_{k=1}^n\alpha(dx_k)\right)t+\sum_{i=2}^nc_it^i$$

where the $c_i$ are constants depending on $\mu$ but not $t$. This yields

\begin{align*} \Phi_n'(0) &=n\left(\int_{\sum_j x_j<n+1}\mu(dx_1)\prod_{k=2}^n\alpha(dx_k)-\int_{\sum_j x_j<n+1}\prod_{k=1}^n\alpha(dx_k)\right)\\ &=n\left[\mathbb P\left(Y_1+\sum_{i=2}^nX_i<n+1\right)-\mathbb P\left(\sum_{i=1}^nX_i<n+1\right)\right] \end{align*}

where $Y_1$ is a random variable with law $\mu$, independent of the iid random variables $X_i$ which have law $\alpha$. Note that since $\mu\neq\alpha$ and $Y_1\ge0$, we have $\mathbb P(Y_1<n+1)>1-\frac1{n+1}$ and hence

$$\mathbb P\left(Y_1+\sum_{i=2}^nX_i<n+1\right)=\mathbb P(Y_1<n+1)\mathbb P(X_1=0)^{n-1}>\left(1-\frac1{n+1}\right)^n$$ and thus $\Phi_n'(0)>0$. This implies that there exists $\delta>0$ such that $\Phi_n(0)<\Phi_n(t)$ for all $t\in(0,\delta)$; since $\mu$ was arbitrary, it follows that $E_n$ has a strict local minimum at $\alpha$.

EDIT: I had originally written here that $E_n$ is convex and conclude the result. However, as Michael points out in the comments, this assertion is not true. It may still be possible to continue with this argument and conclude that $\alpha$ is the global minimizer of $E_n$, but at this point the best I have is that $\alpha$ is a local minimizer.


After having seen fedja's comment and reading related material, I find it unlikely anyone will be able to solve it the generalized version here as it is an open research problem.


I believe the following is a counter-example to Jason's claim that $E_n$ is a convex function. Consider $n=2$. Define:

\begin{align} X &= \left\{ \begin{array}{ll} 1/2 &\mbox{ with prob $\frac{1}{2}$} \\ 3/2 & \mbox{ with prob $\frac{1}{2}$} \end{array} \right.\\ Y &= \left\{ \begin{array}{ll} 0 &\mbox{ with prob $1-\frac{1}{\theta}$} \\ \theta & \mbox{ with prob $\frac{1}{\theta}$} \end{array} \right. \end{align} where $\theta$ is chosen so that $\theta>1$ and $(1-\frac{1}{\theta})^2 = 3/4$, that is, $\theta = 2(2+\sqrt{3})\approx 7.46$. Note that $X$ and $Y$ are nonnegative random variables with $E[X]=E[Y]=1$. Let $X_1, X_2$ be independent copies of $X$ and $Y_1, Y_2$ be independent copies of $Y$. Then: $$ P[X_1+X_2<3] = 1-P[X_1=X_2=3/2] = 3/4$$ $$ P[Y_1 + Y_2 < 3] = P[Y_1=Y_2=0] = (1-1/\theta)^2 = 3/4 $$ To show convexity fails, it suffices to define a random variable $Z$ that is a mixture of $X$ and $Y$, yet $P[Z_1+Z_2<3]>3/4$ for $Z_1, Z_2$ independent copies of $Z$.

Define $Z$ by independently flipping a fair coin and choosing $Z=X$ if heads; $Z=Y$ else: $$ Z = \left\{ \begin{array}{ll} 0 &\mbox{ with prob $\frac{1}{2}(1-\frac{1}{\theta})$} \\ 1/2 & \mbox{ with prob $\frac{1}{4}$} \\ 3/2 & \mbox{ with prob $\frac{1}{4}$} \\ \theta & \mbox{with prob $\frac{1}{2}\frac{1}{\theta}$} \end{array} \right.$$ Let $Z_1, Z_2$ be independent copies of $Z$. Then \begin{align} P[Z_1+Z_2<3] &= P[Z_1=0]P[Z_2 \in \{0, 1/2, 3/2\}] \\ &+ P[Z_1=1/2]P[Z_2 \in \{0, 1/2, 3/2\}] +P[Z_1=3/2]P[Z_2 \in \{0, 1/2\}] \\ &=\frac{3}{8}+ \frac{\sqrt{3}}{4} \\ &> 3/4 \end{align}