What is "white noise" and how is it related to the Brownian motion?
In the Chapter 1.2 of Stochastic Partial Differential Equations: An Introduction by Wei Liu and Michael Röckner, the authors introduce stochastic partial differential equations by considering equations of the form $$\frac{{\rm d}X_t}{{\rm d}t}=F\left(t,X_t,\dot B_t\right)$$ where $\left(\dot B_t\right)_{t\ge 0}$ is a "white noise in time" (whatever that means) with values in a separable Hilbert space $U$. $\left(\dot B_t\right)_{t\ge 0}$ is said to be the "generalized time-derivative of a $U$-valued Brownian motion $(B_t)_{t\ge 0}$.
Question: What exactly do the authors mean? What is a "white noise in time" and why (and in which sense) is it the "generalized time-derivative" of a Brownian motion?
You can skip the following, if you know the answer to these questions. I will present what I've found out so far:
I've searched the terms "white noise" and "distributional derivative of Brownian motion" on the internet and found few and inconsistent definitions.
Definition 1: In the book An Introduction to Computational Stochastic PDEs the authors do the following: Let $(\phi_n)_{n\in\mathbb N}$ be an orthonormal basis of $L^2([0,1])$, e.g. $\phi_n(t):=\sqrt 2\sin(n\pi t)$. Then $$W_t:=\lim_{n\to\infty}\sum_{i=1}^n\phi_i(t)\xi_i\;\;\;\text{for }t\in [0,1]\;,$$ where the $\xi_i$ are independent and standard normally distributed random variables on a probability space $(\Omega,\mathcal A,\operatorname P)$, is a stochastic process on $(\Omega,\mathcal A,\operatorname P)$ with $\operatorname E[W_t]=0$ and $$\operatorname E[W_sW_t]=\delta(s-t)\;\;\;\text{for all }s,t\in [0,1]$$ where $\delta$ denotes the Dirac delta function. They call $(W_t)_{t\in [0,1]}$ white noise.
This definition seems to depend on the explicit choice of the orthnormal basis $(\phi_n)_{n\in\mathbb N}$ and I don't see the connection to a "derivative" of a Brownian motion (needless to say that I don't see how this would generalize to a cylindrical Brownian motion).
However, maybe it has something to do with the following: Let $(B_t)_{t\ge 0}$ be a real-valued Brownian motion on $(\Omega,\mathcal A,\operatorname P)$. Then the Karhunen–Loève theorem yields $$B_t=\lim_{n\to\infty}\sum_{i=1}^n\sqrt{\zeta_i}\phi_i(t)\xi_i\;\;\;\text{for all }t\in [0,T]$$ in $L^2(\operatorname P)$ and uniformly in $t$, where $(\phi_n)_{n\in\mathbb N}$ is an orthonormal basis of $L^2([0,1])$ and $(\xi_n)_{n\in\mathbb N}$ is a sequence of indepedent standard normally distributed random variables on $(\Omega,\mathcal A,\operatorname P)$. In particular, $$\zeta_i=\frac 4{(2i-1)^2\pi^2}$$ and $$\phi_i(t)=\sqrt 2\sin\frac t{\sqrt{\zeta_i}}\;.$$
The authors state, that we can formally consider the derivative of $B$ as being the process $$\dot B_t=\lim_{n\to\infty}\sum_{i=1}^n\phi_i(t)\xi_i\;.$$ I have no idea why.
Nevertheless, we may notice the following: Let $${\rm D}^{(\Delta t)}_t:=\frac{B_{t+\Delta t}-B_t}{\Delta t}\;\;\;\text{for }t\ge 0$$ for some $\Delta t>0$. Then $\left({\rm D}^{(\Delta t)}_t\right)$ is a stochastic process on $(\Omega,\mathcal A,\operatorname P)$ with $$\operatorname E\left[{\rm D}^{(\Delta t)}_t\right]=0\;\;\;\text{for all }t\ge 0$$ and $$\operatorname{Cov}\left[{\rm D}^{(\Delta t)}_s,{\rm D}^{(\Delta t)}_t\right]=\left.\begin{cases}\displaystyle\frac{\Delta t-|s-t|}{\Delta t^2}&\text{, if }|s-t|\le \Delta t\\0&\text{, if }|s-t|\ge \Delta t\end{cases}\right\}=:\eta^{(\Delta t)}(s-t)\;\;\;\text{for all }s,t\ge 0\;.$$ Since $$\int\eta^{(\Delta t)}(x)\;{\rm d}x=\int_{-\Delta t}^{\Delta t}\eta^{(\Delta t)}(x)\;{\rm d}x=1$$ we obtain $$\eta^{(\Delta t)}(x)\stackrel{\Delta t\to 0}\to\delta(x)\;,$$ but I have no idea how this is related to white noise.
Definition 2: In Stochastic Differential Equations with Applications to Physics and Engineering, Modeling, Simulation, and Optimization of Integrated Circuits and Generalized Functions - Vol 4: Applications of Harmonic Analysis they take a real-valued Brownian motion $(B_t)_{t\ge 0}$ on $(\Omega,\mathcal A,\operatorname P)$ and define $$\langle W,\phi\rangle:=\int\phi(t)B_t\;{\rm d}\lambda\;\;\;\text{for }\phi\in\mathcal D:=C_c^\infty([0,\infty))\;.$$ Let $\mathcal D'$ be the dual space of $\mathcal D$. We can show that $W$ is a $\mathcal D'$-valued Gaussian random variable on $(\Omega,\mathcal A,\operatorname P)$, i.e. $$\left(\langle W,\phi_1\rangle,\ldots,\langle W,\phi_n\rangle\right)\text{ is }n\text{-dimensionally normally distributed}$$ for all linearly independent $\phi_1,\ldots,\phi_n\in\mathcal D$, with expectation $$\operatorname E[W](\phi):=\operatorname E\left[\langle W,\phi\rangle\right]=0\;\;\;\text{for all }\phi\in\mathcal D$$ and covariance $$\rho[W](\phi,\psi):=\operatorname E\left[\langle W,\phi\rangle\langle W,\psi\rangle\right]=\int\int\min(s,t)\phi(s)\psi(t)\;{\rm d}\lambda(s)\;{\rm d}\lambda(t)\;\;\;\text{for all }\phi,\psi\in\mathcal D\;.$$ Moreover, the derivative $$\langle W',\phi\rangle:=-\langle W,\phi\rangle\;\;\;\text{for }\phi\in\mathcal D\tag 1$$ is again a $\mathcal D'$-valued Gaussian random variable on $(\Omega,\mathcal A,\operatorname P)$ with expectation $$\operatorname E[W'](\phi)=0\;\;\;\text{for all }\phi\in\mathcal D\tag 2$$ and covariance \begin{equation} \begin{split} \varrho[W'](\phi,\psi)&=\int\int\min(s,t)\phi'(s)\psi'(t)\;{\rm d}\lambda(s)\;{\rm d}\lambda(t)\\ &=\int\int\delta(t-s)\phi(s)\psi(t)\;{\rm d}\lambda(t)\;{\rm d}\lambda(s) \end{split} \end{equation} for all $\phi,\psi\in\mathcal D$. Now they call a generalized Gaussian stochastic process with expectation and covariance given by $(1)$ and $(2)$ a Gaussian white noise. Thus, the generalized derivative $W'$ of the generalized Brownian motion $W$ is a Gaussian white noise.
Again, I don't know how I need to generalize this to the case of a cylindrical Brownian motion. Moreover, this definition seems to be less naturally to me and I don't think that this is the notion Liu and Röckner had in mind.
Definition 3: In some lecture notes, I've seen the following the definition: Let $W$ be a centered Gaussian process, indexed by test functions $\phi\in C^\infty([0,\infty]\times\mathbb R^d)$ whose covariance is given by $$\operatorname E\left[W_\phi W_\psi\right]=\int_0^\infty{\rm d}t\int_{\mathbb R^d}{\rm d}x\int_{\mathbb R^d}{\rm d}y\phi(t,x)\psi(t,x)\delta(x-y)\tag 3$$ or $$\operatorname E\left[W_\phi W_\psi\right]=\int_0^\infty{\rm d}t\int_{\mathbb R^d}{\rm d}x\phi(t,x)\psi(t,x)\tag 4\;.$$ Then $W$ is called "white noise in time and colored noise in space" in the case $(3)$ and "white noise, both in time and space" in the case $(4)$. They simply state that $\delta$ is some "reasonable" kernel which might blow up to inifinity at $0$.
I suppose this is related to Definition 2. Again, I don't know how I need to generalize this to the case of a cylindrical Brownian moton.
Definition 4: This definition is very sloppy in its notation: Let Let $(W_t)_t$ be a centered Gaussian process with covariance $\operatorname E[W_sW_t]=\delta(s-t)$ where $\delta$ denotes the Dirac delta function. Then, in a [lecture note] I've found (Example 3.56), they state that $$B_t:=\int_0^tW_s\;{\rm d}B_s\tag 5\;\;\;\text{for }t\ge 0$$ is a real-valued Brownian motion. I haven't verified that result. Is it correct? Whatever the case is, if this is the reason, why white noise is considered to be the derivative of a Brownian motion, we should be able that every Brownian motion as a representation of the form $(5)$. Can this be shown?
The same questions as above remain.
Definition 5: Let $(B_t)_{t\ge 0}$ be a real-valued Brownian motion on $(\Omega,\mathcal A,\operatorname P)$ and define $$\langle W,\varphi\rangle:=\int_0^\infty\varphi(s)\;{\rm d}B_s\;\;\;\text{for }\phi\in\mathcal D:=C_c^\infty((0,\infty))\;.$$ Then $$\langle W',\varphi\rangle:=\int_0^\infty\varphi'(s)\;{\rm d}B_s\;\;\;\text{for }\phi\in\mathcal D$$ is considered to be the generalized derivative of the generalized Brownian motion $W$.
The same questions as above remain.
Conclusion: I've found different notions of "white noise" and "generalized derivative" of a Brownian motion, but I don't know in which sense they are consistent and which of them Liu and Röckner meant. So, I would be very happy if someone could give a rigorous definition of these terms in the case of a cylindrical Brownian motion or at least in the case of a Hilbert space valued Brownian motion.
I've taken a look at the book and this chapter is introductory and somewhat informal, so I imagine the authors are more specific about what they mean by a white noise in space or time and what they mean by the S(P)DE in your question in later chapters. Nevertheless, I have addressed aspects of your question below.
A discussion of Definitions 2, 3 and 5 are contained in an answer of mine to a similar question here. Everything in that answer is real-valued (which hopefully doesn't make too much of a difference) and indexed by a single real variable (or more precisely a test function of a single real variable); this can make a significant difference depending on what you want to know.
Definition 2
The random distribution that acts on $\phi$ via $(W, \phi) = \int \phi(t) B_t dt$ is just the Brownian motion $B$ (i.e. we can identify the function $B$ with the distribution $W$).
Your definition of $W'$ is then how I define white noise (denoted $X$) in the answer linked to above: white noise $X$ is defined as the random distribution that acts on a test function $\phi$ by $(X, \phi) = -\int_0^\infty B(t) f'(t) dt$. In the parlance of the book you cite, this is a white noise in time (time is the only variable in that answer). However, you can generalize this definition to white noise in space and time (see the discussion of Definition 3 below).
Definition 3
Here $W$ is your white noise (not $W'$ as in Definition 2).
To link this to definition 2, set $d = 0$ (so there is no spatial component to the domain of $\phi$). With $X$ defined as above, $(X_\phi := (X, \phi) : \phi \in C^\infty([0, \infty))$ is a centered Gaussian process with covariance $E(W_\phi W_\psi) = (\phi, \psi)_{L^2}$ (by the Ito isometry). The definition you have stated is a generalization to the case where the process is indexed by space and time (more precisely by test functions of space and time).
Definition 5
Your definition of $W$ is the same (by stochastic integration by parts) as the definition of $X$ above. Thus, $W$ here is once again white noise ($W'$ is then the distributional derivative of white noise).
Definition 1
In this definition, while the realization of the process you get in this way depends on the choice of basis, its (probability) distribution is independent of basis. You can think of a white noise as any process with this distribution.
This definition must be understood in the sense of distributions (now referring to Schwartz distributions) as white-noise is not defined pointwise (so $W_t$ is meaningless). A more precise definition is that $W$ acts on a test function $\phi$ by $W_\phi := (W, \phi) = \sum_{i=1}^\infty \xi_i (\phi, \phi_i)$. Now you can check that $W_\phi$ has mean $0$ and that \begin{equation} E(W_\phi W_\psi) = E\sum_{i=1}^\infty \xi_i \xi_j (\phi, \phi_i) (\psi, \phi_j) = \sum_{i=1}^\infty (\phi, \phi_i) (\psi, \phi_j) = (\phi, \psi)_{L^2}. \end{equation} Thus, the only thing to check to see that $W$ has the same distribution as the processes above is that it is Gaussian.
I'm a physicist, so I'll give you the "dirty" answer. No definitions, theorems, or proofs.
Let's start with Brownian motion. Brownian motion is the path taken by tiny particles in a viscous fluid due to being bombarded by the random thermal motion of the fluid molecules. There are two main modeling approaches. Einstein used a limited derivation of the Fokker-Plank equation to show that an ensemble of such particles obeys the diffusion equation. Langevin took a noise approach and showed that a particle with a small amount of momentum driven by small uncorrelated impacts follows a path with an exponentially decaying autocorrelation.
Turning back to white noise (I'll tie it all together eventually), assuming spatial and temporal homogeneity (the conditions in the infinite beaker are the same everywhere and do not change in time), then the tiny impacts to the particle constitute a sort of noise signal in time. Maybe they are the readings on an impact meter strapped to the particle. Because these impacts are 1) uncorrelated, 2) independent, and 3) comprised of an enormous amount of other collisions taking place in the fluid between hitting the Brownian particles, their magnitude has a Gaussian distribution.
If you take the Fourier transform (in real life, the FFT) of the impact signals for a large ensemble of Brownian particles and average them, you find that the power spectrum is constant over all frequencies, and that the power for a given frequency is distributed across the ensemble as a Gaussian distribution around the mean. Thus the impact signals are (on average) a combination of equal portions of all frequencies, we call it "white" noise, as in light comprised of all frequencies of visible light.
Back to the particles, their motion is the sum of a large number of these white noise impacts. If you are willing to consider this an integral in time, then you know that the power spectrum of this time integral will be proportional to $1/f^2$. This is the power spectrum of Brownian motion. Taking this the opposite way, from the Langevin equation you can see that the motion of a Brownian particle has an exponentially decaying autocorrelation. This corresponds to a power spectrum that decays as $1/f^2$. Calculating the derivative in frequency space, the derivative if Brownian motion looks like white noise.
None of this is either physically nor mathematically rigorous. But this is the general modeling approach used in physics and digital signal processing. However, these dirty models are the basis, and I might even say raison d'etre for the rigorous mathematical models. However, being dirty, there are probably multiple ways the white noise and Brownian motion can be defined. So it may simply be that when you read book A or paper X you need to use their definition.