Understanding the Fokker-Planck equation for non-stationary processes

In the Fokker-Planck equation, the unknown function (called here $p$) is a spatial probability density function at a given time $t$. We can write the Fokker-Planck equation as follows : $$\left\{\begin{array}{ll}\frac{\partial p}{\partial t}+\frac{\partial}{\partial x}\big(\mu(x,t)p\big) -\frac{\partial^2}{\partial x^2}\left(D(x,t)p\right)=0\\ p(x,t_0)=f(x).\end{array}\right.$$ This govern the evolution of the probability distribution function from the initial condition $f(x)$. An important property of the Fokker-Planck equation is the called mass conservation. The quantity $\int_{\mathbb R}p(x,t)\mathrm dx$ is independent of $t$ and is equal to $\int_{\mathbb R}f(x)\mathrm dx$. If $f(x)=\delta(x-x_0)$, we have $\int_{\mathbb R}p(x,t)\mathrm dx=1$ for all time $t$. As the equation is linear, we deduce that if we call $p(x,t|x_0,t_0)$ the solution of the Fokker-Planck equation with $f(x)=\delta(x-x_0)$ then the solution $p(x,t|f,t_0)$ to the Fokker-Planck equation starting at $t_0$ with the probability distribution $f$ is $$p(x,t|f,t_0)=\int_{\mathbb R}p(x,t|x_0,t_0)f(x_0)\mathrm dx_0.$$ Note that this is another form of the "probability rule" (also called the Chapman-Kolmogorov relation) because $f(x)=p(x,t_0|f,t_0)$. We have just shown that $p(x,t|x_0,t_0)$ is the Green's function of the operator $$\frac{\partial}{\partial t}+\frac\partial{\partial x}\Big(\mu(x,t)\cdot\Big)-\frac{\partial^2}{\partial x^2}\Big(D(x,t)\cdot\Big).$$

In the case of time dependent coefficient $\mu$ and $D$, the question becomes very much dependent on the actual expressions of $\mu$ and $D$. For instance, if $D(x,t)=\mathscr Dt$ and $\mu=0$, the solution is exactly obtained from the usual Green's function as $$p(x,t|0,0)=\frac{1}{\sqrt{2\pi\mathscr Dt^2}}\exp\left(-\frac{x^2}{2\mathscr D t^2}\right).$$ But this is an exceptional situation. There are usually no exact solutions. If the time variation of $\mu$ and $D$ are bounded, a possible approach is to use a multiple-scale expansion consisting of introducing several slow time scales and a perturbation parameter. This is more robust than the standard perturbative expansion especially for time-dependent problems. Many other techniques exist, such as the method of matched asymptotic expansions. Solving time dependent partial differential equations is a difficult problem in general.


You may or may not be familiar with the result that a random walk, even a one-dimensional one with equal probability for kinds of steps, has a physically speaking rather odd behavior regarding the expected walking distance.

Imagine a drunk guy on a street with large tiles, who wanders around without self-control. Let's fix a time and a distance scale. Say at each second he makes one step over a tile, either to the left or to the right. Say at each second, let's say you got equal chance for either side. At second $t=0$, you got

$p(x=0, t=0)=1=1\cdot 2^{0}$

and zero chance for any other position. At second $t=1$, he moved away to the left or to the right, so

$p(x=0, t=1)=0$

and

$p(x=\pm 1, t=1)=\frac{1}{2}=1\cdot 2^{-1}$

At $t=2$, there's a chance of $\frac{1}{2}^{2}$ he's landed on the outermost possible position, so

$p(x=\pm 2, t=2)=\frac{1}{4}=1\cdot 2^{-2}$

while there are two paths he could have gone to end up in the middle again

$p(x=0, t=2)=2\cdot 2^{-2} = \frac{1}{2}$

You can compute the chances from the diagram below.

random walk

Here's it's clear now that the outermost position will have the exponentially falling chance of $2^{-n}$, because it would mean you get e.g. "he went left" $n$ times in a row. On the other hand, there are more and more paths that turned around and come bank to the center.

Importantly, the distribution of probabilities is one that spreads out.

Surely, the expected position after any number of (even) time steps, the average of all steps, is $x(t=n)=0$. Let's say ${\rm E}[X_n]=0$.

But the movement is anything but linear and the expected distance (unsigned, disregarding whether he effectively moved left or right) turns out to go as $n^\tfrac{1}{2}$, i.e. "$|x(t=n)|=\sqrt{n}$". Or, to capture this more formally, ${\rm E}[X_t^2]=t$ or, for the next paragraph, ${\rm E}[X_t^2]^\tfrac{1}{2}=t^\tfrac{1}{2}$.

I said this is physically odd, because you can't represent this typical (unsigned) path by an ODE, like you'd do in mechanics 101. The function $x(t)=\sqrt{t}$ has no velocity

$v(t=0):=\lim_{t\to 0}\lim_{h\to 0}\dfrac{x(t+h)-x(t)}{h}$

at the origin. It's clear from the steep graph of $\sqrt{t}$ at the origin.

The lesson is that you want to use a whole pdf to describe e.g. a particles motion for situation where random walk model applies, e.g. in gases where the random steps come from unpredictable pushes of other particles from all sides.

This all might seem like a long precursor, but the takeaway I want you to get from this is that here an unevenness in powers arose in $${\rm E}[X_t^2]=t$$

You have a length unit to the power of 2 on the left, and a time unit to the power of 1 on the right. This is the pattern that follows through everything related to the Brownian process, which provides the distribution for this situation - the information that I captured for the first $n=5$ steps in the discrete case above. The probability w.r.t. $x$ looks like $$p_D(x,t) = \dfrac{1}{\sqrt{4\pi}}\left(\dfrac{1}{D\,t}\right)^\frac{1}{2}\exp\left({-\dfrac{1}{D\,t}\left(\dfrac{x}{2}\right)^2}\right)$$

That density function is such that for small $t$ it's very sharp and high, becuase of the $\frac{1}{\sqrt{t}}$, and then as $t$ grows is falls and spreads.

enter image description here

Here $D$ regulates how time- and a length scales interact. You may choose a different time scale and thus renormalize $D$ to 1.

At this point you want to do a unit analysis. This is a density w.r.t. $x$, you find $\left(\dfrac{1}{D\,t}\right)^\frac{1}{2}$ must have the units of one over length, and thus $D$ is length squared per time, i.e. the same units as $x^2/t$. You may also arrive at that conclusion from $\dfrac{1}{D\,t}\left(\dfrac{x}{2}\right)^2$, by the observation that the argument of a power series like the $\exp$ must be unit less. It may be worth mentioning that now $v_D(t):=\left(\dfrac{D}{t}\right)^\frac{1}{2}$ makes for a velocity.

Now take a look at the differential equation this exp-function is the solution of, the diffusion equation $$\dfrac{\partial}{\partial{}t} p_D(x,t) = D \dfrac{\partial^2}{\partial{}x^2} p_D(x,t)$$

Here again, check the units. For this to check out, $D$ must share units of $x^2/t$.

Let's step a step back and try and understand why this describes diffusion in the first place. Consider the function $h(x):=7 x^2$. We have

$\dfrac{\partial^2}{\partial{}x^2} h(x) = 7\cdot 2>0$

Consider the function $k(x):=-5 x^2$. We have

$\dfrac{\partial^2}{\partial{}x^2} k(x) = -5\cdot 2<0$

The equations say that the gain of the density at a position $x=\zeta$, i.e. the chance $\dfrac{\partial}{\partial{}t} p_D(x=\zeta, t)$, equals the curvature of the function at the same position $x=\zeta$. Now look at the plots with the three $\exp$-functions posted above. Wherever the function (or any function) behaves concave like $-x^2$, the differential equation will steal from there, and wherever the equation is convex like $+x^2$, the differential equation will reward there. This is also why the turning point of the curvature are hardly moving. The differential equation describes a diffusion of value from concave to convex. And the differential operator is linear, meaning that if you overlap two exponentials with different center points, say, then resulting two concave peaks will be punished just the same.

The function describes a drunken guy motion in this sense. You may say at $t_0$ a person is at $x_0$ (the $p(x,t=0|x_0,t_0)=\delta(x-x_0)$ initial condition you mention), and then $p(x,t|x_0,t_0)$ will in this context give you, at time $t$, the $x$-distribution as it evolved from $x_0$ after the time $\Delta t = t-t_0$ has passed. The classical Greens function application would be when you have an eletrical charge that acts as source for an electrical field, ${\rm div}\, E=\delta$. Here $p$ acts like a Green function in that for certainty of position $x_0$ at $t_0$, it tells you how the knowledge diffuses, and if you got 10 independent drunken men you can't distinguish, then you can you have 10 densities that eventually merge together leaving you with one more or less flat blob of I don't know where they are now. The bulk here just always spreads out and away from its won peaks, not from an external source.

Your $g$ would be an initial distribution (don't need to be 10 sharp peaks), and naturally depend on $x$ and be given for a particular time $t_0$.

Yes as far as the context in your question goes, it doesn't matter if it's a conditional probability and formally depends on $x',t'$. The function could also depend on your moms back account. The differential equality is one in $x$ and $t$ and the normalization is w.r.t. $x$. And if the initial condition is one with a delta at $x_0$, then the solution will have $x_0$ too anyway. Keeping track of the $x_0$ is relevant when you do e.g. path integrals in quantum mechanics (the Schrödinger equation has also this form, but with imaginary $D$), or Kalman filters/Recursive Bayesian estimation in sensor fusion (i.e. whenever you do anything that looks like a smooth version of Bayes theorem).

And an equation with non-constant $D(x,t)$ just means the peak-penalty is determined locally. Just like smoking weed is penalized differently in different countries. The $\mu$ (check the units) induces a drift of the center in time, or a deterministic draw if you consider stochastic processes.