In stochastic calculus, why do we have $(dt)^2=0$ and other results?
Solution 1:
Others already told you that a proper proof is more involved. However I want to point out, that there is a bit of intuition behind those rules. For me, having seen the way ordinary calculus is treated by physicists, the following is rather insightful:
Think of $dt$ as a really small increase in time $dB_t$ as the change the Brownian motion does in this small time increase and so on. Then the change of a quantity $X_t$ in some time interval is just the sum of all such small changes $dX_t$, or bending notation, the integral. In a way this is also what you do when defining the stochastic integral.
Using this, (dt)^2 is simple. If you separate your interval in about $\sim N$ steps, then $dt \sim \frac{1}{N}$. So $(dt)^2 \sim \frac{1}{N^2}$. Yet if you sum $N$ steps of size $\frac{1}{N^2}$, in the limit $N\to \infty$, you will end up with nothing. So $(dt)^2$ has no effect.
Now $dB_t$ is a strange beast. Comparatively it is of size $\sqrt{dt}$ (have a look at the scaling of the probability density in $x$ and $t$), so by the reasoning above, the integral should explode. Yet it changes sign all the time and those changes cancel each other quite nicely in the end.
On the other hand $(dB_t)^2 \sim \sqrt{dt}^2 = dt$ and has a fixed sign, so the calculation rule at least is not surprising. (That it is actually equal, however, is a not so trivial matter.)
The last one, $dB_t dt$ again is simple, as this is of order $\sim (dt)^{3/2}$, which allows us to use a similar argument to $(dt)^2$.
I should again stress, that this is only intuition, which cannot prove anything, only help in finding what to prove. For example using this, Ito's formula looks just like a Taylor expansion in $dB_t$ and $dt$, where you throw away the terms of order larger than $dt$.
Solution 2:
For stochastic processes of the form $ dX_t = \theta_tdt+ K_tdB_t$ and $ dY_t = \gamma_tdt+ L_tdW_t$ where $B_t$ and $W_t$ are two correlated Brownian motions with correlation coefficient $\rho$ you have $d<X,Y>_t = \rho K_tL_tdt$ (for more details see convergence of quadratic variation and covariation of stochastic processes).
Examples you mentioned above are particular cases :
$(dt)^2 = 0$ because your process have no diffusion i.e, $X(t)=t$ meaning $K_t=0$ and $\theta_t=1$ $\forall t$ and $d<X,X>_t=(dt)^2=0$.
$dZ(t)^2 = dt$ you took $X(t)=B(t)$ where $B(t)$ is a B.M this gives $K_t=1$ and $\theta_t=0$ $\forall t$ and thus you obtain $d<X,X>_t=dt$ (notice that the correlation between a process and itself is 1).
for the third example $X(t)=t$ and $Y(t)=W(t)$ this gives $K_t=O$ and $L_t=1$ and you get the result.
For all processes $X_t$ with bounded variation and any other stochastic process $Y_t$ you have $d<X,X>_t=d<X,Y>_t=0$
Solution 3:
Other's have better answers, but in a simpler way:
$(dt)^2 ≈ 0$
Calculus looks at changes during small time steps $dt$ which when tiny, $(dt)^2 << dt$, so is approximated to be $0$. $[0.001 >> (0.001)^2 = 0.000001]$.
$dZ(t)^2 = dt$
To do with the 'scaling' chosen get a nice process. Out of three cases, $dZ(t)^2 \approx O(dt)$ provides a stable process, so we use $dZ(t)^2 = dt$.
[The two discarded cases are
- $dZ(t)^2 \rightarrow 0$ quicker than $dt$ does - process collapses to $0$,
- $dt \rightarrow 0$ quicker than $dZ(t)^2$ does - process grows indefinitely
.]
$dZ(t)dt=0$
uses the first and second points. Pretty much anything $< dt ≈ 0$, and we know $dZ(t) = \sqrt{dt}$ (from $dZ(t)^2 = dt$), therefore $dZ(t)dt < dt$.
EDIT: I think this is actually a botched answer more related to why we pick what we do when going from the BSE to the Kolmogorov equation.