My notes on confidence give this question:

An investigator is interested in the amount of time internet users spend watching TV a week. He assumes $\sigma = 3.5$ hours and samples $n=50$ users and takes the sample mean to estimate the population mean $\mu$

Since $n=50$ is large we know that $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$ approximates the Standard Normal. So, with probability $\alpha = 0.99$, the maximum error of estimate is $E = z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \approx 1.27$ hours.

The investigator collects that data and obtain $\bar{X}=11.5$ hours. Can he still assert with 99% probability that the error is at most 1.27 hours?

With the answer that:

No he cannot, because the probability describes the method/estimator, not the result. We say that "we conclude with 99% confidence that the error does not exceed 1.27 hours."

I am confused. What is this difference between probability and confidence? Is it related to confidence intervals? Is there an intuitive explanation for the difference?


Solution 1:

Your question is a natural one and the answer is controversial, lying at heart of a decades-long debate between frequentist and Bayesian statisticians. Statistical inference is not mathematical deduction. Philosophical issues arise when one takes a bit of information in a sample and tries to make a helpful statement about the population from which the sample was chosen. Here is my attempt at an elementary explanation of these issues as they arise in your question. Others may have different views and post different explanations.

Suppose you have a random sample $X_1, X_2, \dots X_n$ from $Norm(\mu, \sigma)$ with $\sigma$ known and $\mu$ to be estimated. Then $\bar X \sim Norm(\mu, \sigma/\sqrt{n})$ and we have $$P\left(-1.96 \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right) = 0.95.$$ After some elementary manipulation, this becomes $$P(\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}) = 0.95.$$ According to the frequentist interpretation of probability, the two displayed equations mean the same thing: Over the long run, the event inside parentheses will be true 95% of the time. This interpretation holds as long as $\bar X$ is viewed as a random variable based on a random sample of size $n$ from the normal population specified at the start. Notice that the second equation needs to be interpreted as meaning that the random interval $\bar X \pm 1.96\sigma/\sqrt{n}$ happens to include the unknown mean $\mu.$

However, when we have a particular sample and the numerical value of an observed mean $\bar X,$ the frequentist "long run" approach to probability is in potential conflict with a naive interpretation of the interval. In this particular case $\bar X$ is a fixed observed number and $\mu$ is a fixed unknown number. Either $\mu$ lies in the interval or it doesn't. There is no "probability" about it. The process by which the interval is derived leads to coverage in 95% of cases over the long run. As shorthand for the previous part of this paragraph, it is customary to use the word confidence instead of probability.

There is really no difference between the two words. It is just that the proper frequentist use of the word probability becomes awkward, and people have decided to use confidence instead.

In a Bayesian approach to estimation, one establishes a probability framework for the experiment at hand from the start by choosing a "prior distribution." Then a Bayesian probability interval (sometimes called a credible interval) is based on a melding of the prior distribution and the data. A difficulty Bayesian statisticians may have in helping nonstatisticians understand their interval estimates is to explain the origin and influence of the prior distribution.

Solution 2:

I think probability is used for a random variable. Variable here means something that doesn’t have a constant value and can take a range of values.

First, note that the population mean (or the true mean) is an unknown constant value, so you can’t use probability for it. However, the sample is random. The sample mean is also random. The confidence interval that you create based on the sample mean is also random. So, you may use probability for the sample mean or the confidence interval.

Based on the Central limit theorem, the sample mean follows a Normal distribution (when the sample size is enough large) with a mean equal to the mean of the population. Based on this distribution you can calculate the probability that the sample mean occurs in a specific distance from the distribution mean (which is unknown). However, the distance between $x$ and $y$ equals to the the distance between $y$ and $x$. So, when the sample mean is obtained, one can use the same probability to say that the true mean (which is unknown) occurs in a specific distance from the sample mean (which is now specified), but as we said before the true mean isn’t a random variable and so statisticians use the the term confidence interval.

I want to say that in this case the confidence interval and probability are related and are obtained from a probability distribution, just the terminology causes using different names.