What does it mean for a random variable to converge in $L^2$? [closed]

Solution 1:

Your analysis has a flaw. The term for probability $\frac{1}{n^2}$ ignores the fact that $X_n=n$ in this case. To show convergence in $L^2$ you need to show $ A=\lim_{n,m\to \infty}E((X_m-X_n)^2)\to 0$. However $E(X_n^2)\to 1$ while $E(X_n)\to 0$. Unless you assume strong correlation between $X_n$ and $X_m$, $A=2$.

Thus the sequence of random variables does not converge in $L^2$.

Solution 2:

My answer is going to lay out the things one has to know to understand what it means for a general sequence of random variables to converge in $\mathcal{L}^2$ and when that even makes sense as an expression, which is not required to understand in its entirety if one is looking at less general cases, for example random variables on finite state spaces. I realised halfway through typing it out that you are probably looking for an answer in your specific situation, but I thought it might be a shame to throw it away.

The answer to the question is closely related to the actual proper way of defining a random variable in the first place. Depending on what your definition of random variable actually is and how much you already learnt about that, you will have to look at different things.

In short, a measurable space consists of a set $\Omega$, which we call the set of elementary events together with a $\sigma$-algebra $\Sigma$, which contains all events, so it is a set of subsets of $\Omega$. A probability space does, additionally, carry a map $P\colon \Sigma\to \mathbb{R}$ that takes an event and maps it to a number, which we usually call the probability of the corresponding event. Of course, a $\sigma$-algebra is not just any set of subsets of $\Omega$, it has to have certain properties. In the same way, $P$ is not just any map, it needs to be a (probability) measure. If any of these terms are unfamiliar to you, then you should start by reading about these in any introductory book on measure theory or probability theory.

In case that you have seen the above before: Given a probability space $(\Omega,\Sigma, P)$ and a measurable space $(\Omega',\Sigma')$, we say that a function $f\colon \Omega\to\Omega'$ is $\Sigma-\Sigma'$-measurable if for any set $A\in\Sigma'$, we have $f^{-1}(A)\in \Sigma$. A random variable with values in some measurable space $(\Omega',\Sigma')$ is just a fancy way of saying "measurable function from a probability space into $(\Omega',\Sigma')$". If you don't know why this yields a good description of what you intuitively think a random variable should be, then I once again advise you to pick up an introductory book on probability theory.

An especially important case of measurable functions is the measurable functions with values in $\mathbb{R}$. Of course for the term measurable to make any sense, we need to actually equip $\mathbb{R}$ with a $\sigma$-algebra, people will implicitly assume the Lebesgue or Borel $\sigma$-algebra and there is some things to be said about what one should choose for which purpose, but this is not the place for that. If you are unfamiliar with these special $\sigma$-algebras, once again, you could read up on those.

If $(\Omega,\Sigma,\mu)$ is any measure space, we denote the set of $\mathbb{R}$-valued measurable functions by $\mathcal{L}(\Omega,\Sigma,\mu)$. This is actually a vector space, which is nice, and $\mu$ allows us to almost equip it with one/several norms. For each $1\leq p<\infty$ the maps $\lVert\cdot\rVert_p\colon f\mapsto \int_\Omega \lvert f\rvert^p d\mu$ almost define a norm, there is just two problems: Some functions may get the value "infinite" and some functions may get the value $0$ even though they are not the 0-function. In reality, we fix the second issue first, by factoring out all the functions that integrate to $0$ (this is in every textbook on spaces of measurable functions/measure theory) and then define $L^p:= \{[f]\mid f \in\mathcal{L}(\Omega,\Sigma,\mu)\colon \lVert f\rVert_p<\infty\}$, where $[f]$ denotes the equivalence class in the quotient.

Now these $L^p$-spaces are very useful function spaces with an extensive wikipedia page and for every $p$ between $1$ and $\infty$, the pair $(L^p,\lVert\cdot\rVert_p)$ is a normed vector space (they are even complete w.r.t. the norm, i.e., Banach Spaces). There are also the $L^\infty$-spaces, but they behave a bit differently, so I leave them out here (its not more complicated, but either you know them already or you gotta read up on a lot of this anyway, so theres no merit in writing it out).

Now, one special case among those is that of $p=2$. And then "convergence in $L^2$" is just that: Convergence of a sequence in $L^2(\Omega,\Sigma,\mu)$ with respect to the norm $\lVert\cdot\rVert_2$. (And convergence in $\mathcal{L}^2$ is convergence w.r.t. $\lVert\cdot\rVert_2$ in $\mathcal{L}^2$, which people usually only talk about in situations where it is not required to factor anything out, because the the underlying probability space is discrete).

As an addendum: The space $L^1$ plays a very important role, since we want to be able to compute the expected value of a real random variable, which means that its integral should be finite, i.e., it should be in $L^1$. The reason why the space $L^2$ is so special is that it is actually a Hilbert Space, i.e., its norm comes from/induces (depends on point of view) a scalar product. In many situations, we want to project a function onto a subspace in an "orthogonal fashion", i.e., compute a unique closest element of a subspace (this happens all the time in statistical machine learning or statistics, e.g. in regression analysis).