I am trying to teach myself about stochastic differential equations. In several accounts I've read, the author defines an SDE as an integral equation, in which at least one integral is a stochastic integral, then writes that in practice, people usually write the SDE in differential form. My question is, why?

There are two possible reasons I can guess:

(i) If you write an SDE in differential form, it makes it easy to simulate sample paths using random walks, and you can think of the SDE as a limit of random walks. This may provide some heuristic understanding that the integral form does not.

(ii) If you write an SDE in differential form, you can do "formal" computations with the differentials that are not necessarily rigourous, but lead to correct conclusions. Personally, I have always had trouble understanding differential forms, and from what I've read, it is difficult to define exactly what $dW_t$ means (the differential of the Wiener process, i.e. Brownian motion). I'm not even sure if $dW_t$ has a precise meaning.


Solution 1:

As someone who does a lot of computations with SDEs, I can tell you my own personal reason that I use the differential notation: convenience.

For one thing, when using differential notation, you don't have to worry about mixing up dummy variables. $$ df(X_t) = f'(X_t)dX_t + \frac12f''(X_t)d\langle X\rangle_t $$ is simple and easy to read, but if I wrote it in integral form $$ f(X_t)-f(X_0) = \int_0^tf'(X_s)dX_s + \frac12 \int_0^tf''(X_s)d\langle X\rangle_s $$ it both takes much longer to write and requires me to use and keep track of the dummy variable $s$. This makes things much easier when working with multidimensional processes especially.

Another major factor is a very handy dandy abuse of notation: $dX_tdY_t := d\langle X,Y \rangle_t$. Using this abuse of notation, I can "multiply" stochastic differentials using ordinary algebraic manipulation, for example $$ d\langle f(X) \rangle_t=d f(X_t) df(X_t) = (f'(X_t))^2d\langle X\rangle_t + \frac12f'(X_t)f''(X_t)d\langle X_t, \langle X_t\rangle\rangle_t + \frac14(f''(X_t))^2d\langle\langle X \rangle_t \rangle_t $$ $$ = (f'(X_t))^2d\langle X \rangle_t + 0 + 0 $$ which actually gives the correct answer. One should be VERY careful abusing notation like this, but if you know what hypotheses allow you to use the shortcut, it saves a great deal of time in doing computations.

I should note that I am NEVER going to assign meaning to an expression like $dW_t$ on its own. I will ONLY use it in a context where a corresponding integral expression makes sense.

Solution 2:

My education in SDEs is all self-taught, so please excuse any inaccuracies below. In particular, my memory is fuzzy on issues of SDE vs associated PDE, and backwards versus forwards equation, for instance.

In differential equations, and in applications, we look at differentials often because that's all that we can infer in a straightforward way. Locally, we are able to make arguments about how something should behave, and we integrate to try to gather what is going on over global distances (finite, rather than infinitesimal).

The solution may depend too much on what else is going on. Initial conditions, boundary conditions. The local differential equation may be true in general, but the actual global relative values may depend greatly on the larger environment. For instance, Maxwell's equations, even in simple distributions, locally will give the same differential relation between charge and electromagnetic field, but the actual solutions of this depend greatly on the charge distribution.

In Black-Scholes-Merton, the non-arbitrage behavior is locally given by the same SDE, but the solution may vary greatly depending on for instance what sort of stock option behavior is given for the final time boundary condition. Also, the chain rule is important in SDEs, and that should happen at the differential level.

Also, and my knowledge of this very informal, I don't consider the $dW_t$ term to be so imprecise. It is probably often taught imprecisely, and with quite a bit of hand waving, but it is not hard to come up with convincing arguments with regards to its behavior. I have in my notes things like $dW_t=W_t\sqrt{dt}$, $dW_t=\lim_{n\to\infty}\sum_{i=1}^nY(ih)\sqrt{\frac{t}{n}}$, where $Y(ih)$ is $-1$ or $1$ with equal probability $p=1/2$ (this is more or less from Finan's text on MFE, a free online textbook). I am pretty sure I didnt' write that right, but don't want to check details right now, so hopefully someone more motivated can check. Also, I don't think that sort of representation is unique, but that it suffices to use that one will work. This reminds me of how you can represent the delta function as the limit of various distributions.

Solution 3:

Intuitively, the symbol $\text{d}W_{t}$ may be interpreted as an infinitesimal increment of a (one-dimensional) Wiener process, $W$. $$\text{d}W_{t} = W_{t+\text{d}t}-W_{t}\ .$$ Increments of a Wiener process are normal distributed random variables whose expected value is zero and whose variance is equal to the time-increment. Thus the symbol $\text{d}W_{t}$ can be interpreted as a normal distributed random variable whose expected value is $0$ and whose variance is equal to $\text{d}t$. This implies that it is quite likely (probability close to 0.7) to find the random variable $\text{d}W_{t}$ located somewhere between the numbers $-\sqrt{\text{d}t}$ and $\sqrt{\text{d}t}\ .$

As to a precise meaning of the symbol $\text{d}W_{t}\ ,$ you have to be more specific what you mean by "precise". After all, what is the precise meaning of the symbol $\text{d}t$? (Without using 1-forms or nonstandard analysis.)

Solution 4:

I like your second reason which is also what I think, but the fact is that not only writing them in differential form makes the various complicated computations easier to handle, but often also they lead to solve PDEs in deterministic sense. For example the case for Black-Scholes-Merton SDE where with suitable terminal and boundary conditions becomes a backward parabolic PDE that we can solve. Therefore, it's better to write a SDE in a quite well-known fashion that we have used to even though, it's "formal" and doesn't have a specific meaning.

Solution 5:

I am somewhat inexperienced and also self taught, but from my understanding, one of the primary advantages to writing it in this form is to "abuse" notation in a similar way that people abuse multiplying by both sides with $dx$ and canceling the $dx$ in $dy/dx$. There is a lot of theory from calculating quadratic variations, but it is easier to understand and process if you just use "formulas" like $dW_{t}dW_{t} = dt$