If we can't Stieltjes integrate Brownian motion pathwise, then what do the values of the Ito integral represent?
The Ito integral strongly resembles the Stieltjes integral (in either Lebesgue or Riemann flavor), to the point that I would almost want to define $(\int_\text{Ito}X \, dM)(\omega)$ as the Stieltjes integral $\int X(\omega) \, dM(\omega)$. The problem of course is that when $M$ is the sort of stochastic process we're interested in, $M(\omega)$ is almost always a function of unbounded variation.
The construction of the Ito integral somehow gets around this by defining itself on the entire probability space all at once, rather than trying to do it pointwise at each $\omega$. But where exactly the magic happens is mysterious to me. At the end of the construction, we somehow end up with a value $I(X,M)(\omega)$. If not the Stieltjes integral of $X$ against $M$, then what is that value? What does it represent?
The whole thing reminds me conditional expectation. We can't define conditional expectation at a point because we'd be trying to condition on an event of measure zero, but if we instead define it on the entire space all at once, via a kind of global property (it should integrate to something appropriate) then we make sense of the problem. Is something similar happening here?
I found this quote on a related question that sounds appealing but I don't know exactly what it means: "...you get the integral evaluated as a limit of sums over partitions converges in probability." (source).
Solution 1:
Let $(B_t)_{t \geq 0}$ be a Brownian motion. Let's forget for the moment everything we already know about stochastic integrals and start at the very beginning of the story:
We would like to define a stochastic integral $\int \dots \, dB_r$ with respect to Brownian motion. Which properties do we expect the stochastic integral to have?
Wish Nr. 1: The stochastic integral should be an additive mapping, i.e. $$\int_0^t (G(r)+H(r)) \, dB_r = \int_0^t G(r) \, dB_r + \int_0^t H(r) \, dB_r$$ whenever we can make sense of the integrals.
Wish Nr. 2: We would like to have $$\int_0^t dB_r = B_t-B_0$$.
Wish Nr. 3: .... is some sort of pull out property. If a random variable $\xi$ does not depend on the values of $(B_r)_{r > s}$, then $$\int_s^t \xi \, dB_r = \xi \int_s^t \, dB_r$$ ("pull out what is known").
If we combine these three properties, we find that the stochastic integral of a simple function of the form
$$H(r,\omega) = \sum_{j=1}^n \xi_j(\omega) 1_{[t_{j-1},t_j)}(r)$$
(where $t_0< \ldots < t_n \leq T$ and $\xi_j$ is $\mathcal{F}_{t_{j-1}}$-measurable) is given by
$$\int_0^T H(r) \, dB_r = \sum_{j=1}^n \xi_j (B_{t_j}-B_{t_{j-1}}) = \sum_{j=1}^n H(t_{j-1}) (B_{t_j}-B_{t_{j-1}}). \tag{2}$$
So far all the integrals can be interpreted as pointwise ($\omega$-wise) integrals, and, moreover, we have a nice interpretation for the integrals.
The bad news: Defining the stochastic integral
$$\int_0^t H(r) \, dB_r$$
as a pointwise limit of Riemann sums of the form $(2)$ works only for a very small class of integrands $H$; this is closely related to the fact that the Brownian motion has almost surely unbounded variation (see this answer for more details). For instance if $H:[0,1] \to \mathbb{R}$ is a continuous mapping, we cannot expect that the limit of the Riemann sums
$$\lim_{n \to \infty} \sum_{j=1}^n H((j-1)/n) (B_{j/n} -B_{(j-1)/n})$$
exists with probability $1$.
The good news: We can simply consider a different type of convergence. Almost sure convergence of a sequence is a pretty strong assumption, and therefore it is quite natural to consider weaker forms of convergence. The most natural choice is clearly convergence in probability (or, alternatively, convergence in $L^2$). It turns out that the limit the Riemann sums converge very nicely in probability, and this allows us to define the stochastic integral for a large class of integrands. Some nice things about this definition:
Almost sure convergence implies convergence in probability. This means that if we have an integrand $H$ for which the integral $\int_0^t H(r) \, dB_r$ can be defined pathwise as the limit of Riemann sums, then both notions coincide.
If $(H_t)_{t \geq 0}$ is a continuous adapted process, then $$\sum_{j=1}^n H(t_{j-1}) (B_{t_j}-B_{t_{j-1}}) \stackrel{n \to \infty}{\to} \int_0^t H(r) \, dB_r \quad \text{in probability}$$ for any sequence $\Pi_n = \{0=t_0 < \ldots < t_n = t\}$ of partitions of $[0,t]$ with mesh size converging to $0$. Note that this implies in particular that there exist a subsequence which converges almost surely.
- The same construction works for a much larger class of driving processes; for instance we can easily replace $(B_t)_{t \geq 0}$ by a martingale with continuous sample paths.
I hope that this answer gave you some insight why
... Riemann sums pop up in the Itô theory although the Itô integral is not defined as a Riemann integral (namely, because we want to have $(2)$ for simple functions)
... there is nothing magic about it. Pointwise convergence does not work, but by considering a weaker form of convergence we can get rid of this problem and obtain a nice stochastic calculus for a large class of integrands.