Concentration of measure vs large deviation
When reading some probability publications I am always not sure why they call this or that inequality a 'concentration inequality' or 'large deviation inequality'. For me these (concentration of measure and large deviation theory) just describe the same phenomenon. So I ask, is there a formal difference between concentration and large deviation inequalities? Or are these really different concepts?
Solution 1:
From my point of view concentration inequalities and large deviation theory are indeed different concepts - but since there are a lot of well-known concentration inequalities which have the form of large deviation estimates it is not surprising that there is sometimes no strict distinction between both notions.
Concentration inequalites are inequalities that bound probabilies of deviations by the random variable $X$ from its mean or median, i.e. upper bounds for probabilities of the form
$$\mathbb{P}\bigg( |X-\mathbb{E}X)| > r \bigg) \quad \text{or} \quad \mathbb{P}\bigg( |X-m(X)| > r \bigg)$$
where $m(X)$ denotes the median of $X$. Some basic concentration inequalites are the Markov inequality,
$$\mathbb{E}(|X| \geq r) \leq \frac{\mathbb{E}(|X|)}{r},$$
and the Tschebysheff inequality,
$$\mathbb{E}(|X-\mathbb{E}X| \geq r) \leq \frac{\text{var} \, X}{r^2}.$$
Often one is interested in the asymptotics of such probabilites, i.e. the asymptotic decay of
$$\mathbb{P}\bigg( |X_n-\mathbb{E}(X_n)| > r \bigg)$$
as $n \to \infty$. Now here comes large deviation theory into play: Large deviation theory yields asymptotic estimates of the form
$$\mathbb{P}\bigg( |X_n-\mathbb{E}(X_n)| > r \bigg) \leq e^{-n I(r)} \tag{1}$$
where $I(r) \geq 0$ is called the rate function. The Chernoff bound, Hoeffding inequality or McDiarmid's inequality are well-known results of this type. This means that large deviation theory provides estimates on an exponential scale. However, concentration inequalities do not need to be of the form $(1)$. A different approach uses so-called concentration functions.
Let me finally mention that large deviation theory deals more generally with the asymptotic behavior of probabilites of the form
$$\mathbb{P} \bigg( X_n \in A \bigg),$$
for some fixed set $A$, on an exponential scale, i.e.
$$\mathbb{P} \bigg(X_n \in A \bigg) \approx \exp \bigg(-n \cdot I(A) \bigg).$$
Obviously, the estimates in $(1)$ are a particular case if we choose $A=[\mathbb{E}(X_n)-r,\mathbb{E}(X_n)+r]$.