Different versions of functional central limit theorem (aka Donsker theorem)?
I have seen several versions of functional central limit theorem (see the end of this post). I am confused, and hope someone could help to clarify their relations and differences. For example, I am wondering
- Do Billingsley's Probability and Measures and his Convergence of Measures define the process $Y_n$ differently?
- Are the conclusion in Billingsley's Convergence of Measures different from the conclusion in his Probaility and Measures, in that the latter says an equivalent process to $Y_n$ but defined on another probability space converges to a Wiener process weakly, while the former says $Y_n$ converges a Wiener process in probability? (Note that convergence in probability implies weak convergence?)
- How are the two versions of functional central limit theorems from Billingsley and the version from Kallenberg related and different?
- How is Wikipedia's Donsker theorem related to the two versions of functional central limit theorems from Billingsley and the version from Kallenberg? I am not able to see if $G_n$ can be written as $Y_n$, and also $G_n$ converges in distribution to a Gaussian process which might not be a Wiener process?
PS: Three versions of functional central limit theorems:
-
In Billingsley's Probability and Measures:
Theorem 37.8. Suppose that $X_1,X_2,·-·$ are independent, identically distributed random variables with mean $0$, variance $σ^2$, and finite fourth moments, and define $Y_n(t), 0\leq t \leq 1$ by $$ Y_n(t, \omega) = \frac{1}{\sigma\sqrt{n}} S_k(\omega), \text{ if }\frac{k-1}{n} < t \leq \frac{k}{n}. $$ There exist (on another probability space), for each n, processes $[Z_n(t): 0 \leq t \leq 1]$ and $[W_n(t): 0 \leq t \leq 1]$ such that the first has the same finite-dimensional distributions as $[Y_n(t): 0 \leq t \leq 1]$, the second is a Brownian motion, and $P[\sup_{t\leq 1} |Z_n(i) - W_n(t)| \geq \epsilon] \to 0$ for positive $\epsilon$.
-
In Billingsley's Convergence of Measures
Theorem 8.2. If $X_1, X_2,...$ are independent and identically distributed with mean 0 and variance $\sigma^2$, and if $Y_n$ is the random function. defined by $$ Y_n(t, \omega) = \frac{1}{\sigma \sqrt{n}} S_{\lfloor nt \rfloor}(\omega) +(nt - \lfloor nt \rfloor) \frac{1}{\sigma \sqrt{n}} X_{\lfloor nt \rfloor + 1}(\omega), \quad 0\leq t \leq 1$$ then $Y_n$ converges to the Wiener process $W$ weakly.
-
In Kallenberg's Foundations of Probability Theory
Theorem 14.9 (functional central limit theorem, Donsker) Let $X_1, X_2, \dots$ be i. i. d. random variables with mean 0 and variance 1, and define $$ Y_n(t) = \frac{1}{\sqrt{n}} \sum_{k \leq nt}X_k, t \in [0,1], n \in \mathbb N $$ Consider a Brownian motion $B$ on $[0, 1]$, and let $f : D[0, 1] \to \mathbb R$ be measurable and a.s. continuous at $B$. Then $f(Y_n) \to f(B)$ in distribution.
-
From Wikipedia
Donsker's theorem identifies a certain stochastic process as a limit of empirical processes. It is sometimes called the functional central limit theorem.
A centered and scaled version of empirical distribution function Fn defines an empirical process $$ G_n(x)= \sqrt n ( F_n(x) - F(x) ) \, $$ indexed by $x ∈ \mathbb R$.
Theorem (Donsker, Skorokhod, Kolmogorov) The sequence of $G_n(x)$, as random elements of the Skorokhod space $\mathcal{D}(-\infty,\infty)$, converges in distribution to a Gaussian process $G$ with zero mean and covariance given by $$ \operatorname{cov}[G(s), G(t)] = E[G(s) G(t)] = \min\{F(s), F(t)\} - F(s)F(t). \, $$ The process $G(x)$ can be written as $B(F(x))$ where $B$ is a standard Brownian bridge on the unit interval.
Thanks and regards!
Those four statements are indeed quite different! Towards unpacking their differences:
a) Notice that the processes in statements 1 and 3 are (essentially) the same-- they constitute at each point $\omega/t$ the partial sum of the observations that has jumps at the points of the form $i/n$. These are both basically then considering the partial sum process as an element of $D[0,1]$, the space of cadlag functions on $[0,1]$. Statement 1 is stronger than statement 3-- while statement 3 is basically saying that the distribution of the partial sum process is close to that of a Brownian motion, statement 1 is saying that there exists a copy of the original partial sum process, defined on a potentially new probability space, and Brownian motions defined on the same space, that are close in probability. As such, and it is a worthwhile exercise to consider, statement 1 can be used to prove statement 3 relatively easily, but not the other way around. Statement 1 belongs to a family of approximation results for stochastic processes known as "weak approximations", have a look at the Skorokhod-Dudley-Wichura theorem, and see https://encyclopediaofmath.org/wiki/Skorokhod_theorem. Note that while it seems kinda weird that all random variables must potentially be redefined on a new probability space, the necessity of doing so is for a very simple and understandable reason: the original sample space for the observations may simply not be rich enough to support a Brownian motion. Skorokhod's original proof works by constructing all variables on the sample space $(0,1)$ equipped with Lebesgue measure.
b) Statement two considers a modified partial sum process that, rather than having jumps, is continuously interpolated using linear interpolation. The processes in statement 1/3 and 2 agree on the points of the form $i/n$. The point of considering this process rather than the one in statement 1/3 is basically for mathematical convenience-- it takes values in the space $C[0,1]$ of continuous functions, which is a complete and separable metric space when equipped with the sup-norm $\|x-y\|=\sup_{t\in [0,1]}|x(t)-y(t)|$. Separability is a key tool in establishing many asymptotic results for measures defined on metric spaces. The space $D[0,1]$ equipped with the sup-norm is NOT separable. As developed in Chapter 3 of Billingsly's 1968 book, a metric on $D[0,1]$ can be defined, this is called the Skorokhod metric, making $D[0,1]$ separable, and such that many functionals of statistical/probabilisitc interest on $D[0,1]$ are continuous with respect to that metric, thereby circumventing the need to transform the partial sum process into $C[0,1]$, which admittedly is kind of clunky.
An even more slick way of handling this has been developed more recently, which is sometimes called weak convergence in Hoffman-Jorgensen sense. Basically in this case weak convergence is defined using outer expectation, and so processes that are not continuous, such as the standard partial sum process, can have their weak convergence considered in the metric space $C[0,1]$, since the weak limit, a Brownian motion, lives in this space. This theory is comprehensively developed in Vaart, Aad van der; Wellner, Jon A. Weak convergence and empirical processes.
c) Statement 4 is a statement about weak convergence of the standard empirical process, which is analogous to statement 3 for the partial sum process. Donsker's original papers on the topic consider these two cases separately, and the development of results in this vein since then have often followed this pattern.