Time derivative of white noise
It is known that the time derivative of Wiener process $W(t)$ is defined as white noise $\xi(t)$ \begin{align} \xi(t) = \frac{dW}{dt} \end{align} By considering $dW/dt$ as finite difference form \begin{align} \frac{dW}{dt} \approx \frac{1}{h} \Big[ W(t+h) - W(t) \Big] \end{align} and take $h \to 0$, the book "An Introduction to Stochastic Differenial Equation" by Lawrence C. Evans (Chapter $3$) shows that the statistic of $\xi(t)$ is given by \begin{align} E[ \, \xi(t) \, ] =0, \quad \mathrm{and} \quad E[ \, \xi(t) \xi(s) \, ] = \delta(t-s) \end{align} which is what we expected as a white noise.
My Question: Motivated by the derivation in that book, I was wondering why can't we take the second time derivative on Wiener process (or take time derivative on white noise). Here is my attempt to take time derivative of white noise and derive the corresponding statistic.
Define $\eta(t) = [ \,\xi(t+h) - \xi(t) \, ] \, / \, h $ and take $h \to 0$ at the last step.
Expected value of $\eta(t)$: \begin{align} E[ \, \eta(t) \,] = \frac{1}{h} \Big\{ E[ \, \xi(t+h) \,] - E[ \, \xi(t) \,] \Big\} = 0 \end{align}
Covariance of $\eta(t)$: \begin{align} E[ \, \eta(t) \eta(s) \,] &= \frac{1}{h^{2}} E \Bigg[ \Big( \xi(t+h) - \xi(t) \Big) \; \Big( \xi(s+h) - \xi(s) \Big) \Bigg] \\ &= \frac{1}{h^{2}} \; \Bigg[ E[\xi(t+h) \xi(s+h)] - E[\xi(t+h) \xi(s)] - E[\xi(t) \xi(s+h)] + E[\xi(t) \xi(s)] \Bigg] \\ &= - \frac{1}{h^{2}} \Big[ \delta(t-s+h) - 2\delta(t-s) + \delta(t-s-h) \Big] \\ &= - \frac{d^{2}}{dz^{2}} \delta(z) \Bigg|_{z = t-s} \qquad \mathrm{as} \quad h \to 0 \\ &= - \frac{2\delta(t-s)}{(t-s)^{2}} \end{align}
It seems that we can define $\eta(t)$ (the time derivative of white noise) with the above statistic. Is there any fault in the above derivation ?
Follow up
In stochastic differential equation (SDE), we usually denote $dX_{t}$ instead of $dX/dt$ where $X$ is a random variable. The SDE actually represents an integral equation. However, in the field of physics or nonlinear dynamics, we often seen the notation like this $dX/dt$ (time derivative of a random variable). In some papers, we can even see the time derivative of white noise and that bothers me a lot since I've always heard that time derivative of white noise is undefined. So, what's the fundamental reason of not defining a time derivative of a random variable ?
Solution 1:
There isn't really an issue with taking derivatives of stochastic processes like $W$, so long as you interpret the resulting process appropriately. Even the usual white noise process "$\xi = \frac{dW}{dt}$" should really be interpreted as a generalized stochastic process, that is the realizations of $\xi$ are generalized functions. This is because - as you state - realizations of $W$ are almost surely nowhere differentiable. However, they do have derivatives "in the sense of distributions" that is, generalized derivatives, and this is one way to attack the problem (the Ito calculus/stochastic differential form $dW_t$ approach is another way). If you have never seen the theory of generalized functions (called distributions elsewhere but in probability that word has another meaning), the following will probably not make too much sense to you, but this is how I work with these things. Gel'fand and Vilenkin ("Generalized Functions Volume IV") is the classic reference for this approach but there are probably better modern refs.
To define a generalized stochastic process $\eta$, you fix a space of test functions - usually smooth, compactly supported functions $\mathcal{D} = C_0^\infty$. Then, a generalized stochastic process $\eta(\omega)$ is a random element of $\mathcal{D}^\prime$ (a map $\eta:\Omega\rightarrow\mathcal{D}^\prime$ where $(\Omega,\mathcal{F},\mathbb{P})$ is a probability space). A much more convenient way to say this is that given any test function $\varphi\in\mathcal{D}$, we have that
$$ X_\varphi = \langle \eta,\varphi\rangle $$ is an ordinary real random variable. The bracket notation is intended to "look like" an inner product, i.e. you can think of $\langle \eta,\varphi\rangle = \int \eta(x)\varphi(x)dx$, though this isn't really correct because $\eta$ is "not a function".
The mean and covariance are then defined as
$$ \langle\mathbb{E}[\eta],\varphi\rangle = \mathbb{E}[\langle\eta,\varphi\rangle] = \mathbb{E}[X_\varphi] $$ and
$$ Cov(\varphi,\psi) = \mathbb{E}[X_\varphi X_\psi] $$ From this, you can extract the covariance operator via
$$ \mathbb{E}[X_\varphi X_\psi] = \langle \mathcal{C}\varphi,\psi\rangle $$ This formula is difficult to parse until you work some examples - we'll see in a second how this works.
Returning to your original question: suppose we want to define $\dot{W}$ using this approach. Well, in the theory of generalized functions, we have the definition
$$ X_\varphi = \langle \dot{W},\varphi\rangle = - \langle W,\dot{\varphi}\rangle $$The negative sign comes from "integration by parts". Now, because $W$ is (almost surely) continuous and $\dot{\varphi}$ is smooth, we can use integrals instead of "abstract brackets":
$$ X_\varphi(\omega) = -\int_{-\infty}^\infty W(t,\omega)\dot{\varphi}(t) dt $$ Thus (interchanging limits requires a moment of justification):
$$ \mathbb{E}[X_\varphi(\omega)] = -\int_{-\infty}^\infty \mathbb{E}[W(t,\omega)] \dot{\varphi}(t) dt = 0 $$ and
$$ \mathbb{E}[X_\varphi(\omega)X_\psi(\omega)] = \int_{-\infty}^\infty\int_{-\infty}^\infty \mathbb{E}[W(s,\omega)W(t,\omega)] \dot{\varphi}(s)\dot{\psi}(t) dsdt = \int_{-\infty}^\infty\int_{-\infty}^\infty \min(s,t) \dot{\varphi}(s)\dot{\psi}(t) dsdt $$ To see how this results in "$k(s,t) = \delta(s-t)$" covariance, you do a bit of calculus, remembering that $\varphi(s)$ and $\psi(t)$ are smooth and compactly supported so all the integration by parts boundary terms vanish, and you see that
$$ \int_{-\infty}^\infty\int_{-\infty}^\infty \min(s,t) \dot{\varphi}(s)\dot{\psi}(t) dsdt = \int_{-\infty}^\infty \varphi(t) \psi(t) dt $$ Thus we have written
$$ \mathbb{E}[X_\varphi X_\psi] = \langle \mathcal{C}\varphi,\psi\rangle $$where $\mathcal{C}$ is the "identity operator", that is the convolution operator with kernel $\delta(s-t)$.
If you want to do the same thing but with $\ddot{W}$, you would start with the definition of the generalized ("distributional") second derivative:
$$ \langle\ddot{W},\varphi\rangle = \langle W,\ddot{\varphi}\rangle $$ You can then work through the same process to see that
$$ \langle\mathbb{E}[\ddot{W}],\varphi\rangle = \langle\mathbb{E}[W],\ddot{\varphi} \rangle = 0 $$ and
$$ \mathbb{E}[X_\varphi X_\psi] = \int_{-\infty}^\infty\int_{-\infty}^\infty\min(s,t) \ddot{\varphi}(s)\ddot{\psi}(t) dsdt = -\int_{-\infty}^\infty \ddot{\varphi}(t)\psi(t) dt = \langle \mathcal{C}\varphi,\psi\rangle $$ Thus the covariance operator is the negative second derivative, i.e. the covariance kernel function is $-\ddot{\delta}(s-t)$.
Additional note In response to a good comment, how do we know that the processes $\dot{W}$ and $\ddot{W}$ are Gaussian? First, a generalized Gaussian random process $\eta$ is one for which any random vector formed by testing against $N$ functions is (multivariate) Gaussian, i.e. if
$$ X_{\varphi_1:\varphi_N} = [\langle \eta,\varphi_1\rangle,\ldots,\langle \eta,\varphi_N\rangle ]^t \in \Bbb{R}^N $$ then $\eta$ is Gaussian if and only if $X_{\varphi_1:\varphi_N}$ is Gaussian for every choice of $(\varphi_1,\ldots,\varphi_N)\in \mathcal{D}^N$. With this definition, it is easy to show that if $W$ is a classical Gaussian random process - say one with almost surely continuous paths such as the Wiener process - then $W$ is also a generalized Gaussian random process.
Then, Gaussianity of the (generalized) derivatives of $W$ follows from the definitions $$ \langle\dot{W},\varphi\rangle := - \langle W,\dot{\varphi}\rangle\\ \langle\ddot{W},\varphi\rangle := \langle W,\ddot{\varphi}\rangle $$ Since $W$ is a generalized Gaussian R.P., $\dot{W}$ and $\ddot{W}$ are as well.