Let $X_1, X_2,\cdots$ be i.i.d. random variables with $E(X_1) = \mu, Var(X_1) = σ^2> 0$ and let $\bar{X}_n = {X_1 + X_2 + \cdots + X_n \over n}$ be the sample average estimator.

Is there a way to calculate how many samples are needed to obtain a solution that is "$\epsilon$ accurate"? From Chebyshev's inequality I can get

\begin{align} P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{Var(\overline{X}_n)}{\epsilon^2} = \frac{σ^2}{n\epsilon^2} \end{align} and can conclude that the convergence rate is linear in $n$.

Are there better bounds for the sample average estimator?


Solution 1:

You may have a look at the large deviations theory.

In this case, when the distribution is regular enough, the Cramer theorem states:

$$ \frac 1n\log P\left(\frac 1n [X_1 + \dots + X_n] > \mu + \epsilon\right) \to -\left[\sup_{t\in \Bbb R} (\mu + \epsilon)t - \log Ee^{tX} \right] $$

The condition being in that case that $ Ee^{tX} $ exists. So the good rate of convergence is $$ P\left(\left|\frac 1n [X_1 + \dots + X_n] - \mu\right| \ge \epsilon\right) \simeq e^{-\alpha n} $$with $\alpha $ given by the right hand side of the preceding equality.

Solution 2:

With a finite variance, you have the Central Limit Theorem which broadly speaking tells you that for large $n$ the mean is approximately normally distributed with mean $\mu$ and variance $\frac{\sigma^2}{n}$. So for large $n$ $$P(|\bar{X}_n - \mu| \geq \epsilon) \approx 2\Phi\left(-\frac{\epsilon \sqrt{n}}{\sigma}\right)$$

As an illustration, with $\sigma^2=1$ and $\epsilon=0.01$, the probability suggested by the Central Limit Theorem and your Chebyshev bound for various values of $n$ are about

       n   CLT      Chebyshev bound 
      10  0.975     1000
     100  0.920      100
    1000  0.752       10
   10000  0.317        1
  100000  0.0016       0.1
 1000000  1.5e-23      0.01
10000000  1.8e-219     0.001

with the Chebyshev bound being extremely loose for large $n$ and giving an upper bound as a probability greater than $1$ for smaller $n$.