Confusions about Radon-Nikodym derivative and dominating measures

I have some difficulties to understand the Radon-Nikodym derivative and link it to the ordinary way of obtaining the probability density function, which is through the derivative of cumulative distribution function (c.d.f.). How can one intuitively link the Radon-Nikodym derivative to the non-measure theoretical definition of probability density function?

I think I need to be more clear here. If we have a c.d.f. then even if it is discontinuos, one can take the derivative and at discontinuous points we have dirac functions. How can one see this from Radon-Nikodym derivative point of view? It might be possible that the c.d.f. is almost nowhere differentiable I guess. Then we wont have a density. Is this case directly observable from Radon-Nikodym derivative point of view?

(Radon-Nykodim Theorem):

Let $(\Omega,\mathcal{F})$ be a $\sigma$-finite measurable space with two measures $\mu$ and $\nu$ on it, where $\nu$ is absolutely continuous with respect to the measure $\mu$. That is for every set $A\in\mathcal{F}$, $\mu(A)=0\Longrightarrow \nu (A)=0$. Then there exists a unique positive measurable function $f:\Omega\rightarrow [0,\infty)$, s.t.

$$\nu(A)=\int_A f \mathrm{d} \mu\quad\forall A\in\mathcal{F}$$

Question1: My first question is about the definition of absolute continuity of one measure w.r.t. another. It is only required that if one of them gives zero for some set $A$, then the other must also give zero. It seems that it is enough for the density function to exist. How is this absolute continuity is linked to the absolute continuity of functions? I know that a function is absolutely continuous if it is differentiable and the derivative is then integrable. But I cannot link this to the definition in Radon-Nykodim derivative. Doeas the definition of measures fill the gap?

Question2: Consider the set of probability measures $$\mathcal{Q}=\{Q:Q=(1-\epsilon)P+\epsilon H,\, H\in\mathcal{H}\}$$ where $\mathcal{H}$ is the set of all probability measures, $P$ is a probablity measure which has a density $p$. Then, it seems there are some $Q\in\mathcal{Q}$, which are not absolutely continuous with respect to $P$. As long as I understood, this only says that those $Q$ cannot have a density with respect to $P$ but they can have a density function with respect to another measure right? At any case since $H$ can be any measure I think there must be some $Q$ which do not accept any density function at all? Especially, if $H$ has some abrubt changes. Am I wrong?

Question3: Now consider the following set $$\mathcal{G}=\{g:D(g,f)\leq \epsilon\}\quad D(g,f)=\int g\log(g/f)\mathrm{d}\mu$$ where $f$ and $g$ are some density functions. In this set it is now known that every density exists and therefore their corresponding probability measures were Radon-Nikodym differrentiable. Here is it true to say that every measure $G_1$ corresponding to $g_1\in \mathcal{G}$ is absolutely continuous with respect to another measure $G_2$ corresponding to $g_2\in \mathcal{G}$?, therefore any $G$ corresponding to $g\in \mathcal{G}$ is also absolutely continuous w.r.t. $F$ corresponding to the density $f$? How to compare the set given in Question 3 to the set given in Question 2 in terms of the existence of the densities and absolute continuity issues?

My last question is a notational issue. In the papers I read they assume that $F$ and $G$ are absolutely continous w.r.t. some dominating measure e.g. $\mu=F+G$. Then I know that $f$ and $g$ exist but to use the same $\mu$ for every $G$? for examples when they define $D(g,f)=\int g\log(g/f)\mathrm{d}\mu$, there is only one $\mu$ but uncountable many number of $G$ and apperantly all of them has a density with respect to some measure, say $\phi_G$ but then who guarrantees that $\mu=\phi_G$ for all $G$.

It seems I have alot of confusions. I hope you can help me to clarify these issues. Thank you very much for reading this post. Any comment or answer will be highly apprecated.


Solution 1:

I learnt about RN derivative from "Real Analysis" by Folland, and would advise you to check it out there (Chapter 3) as it may answer your coming questions. In particular, Theorem 3.5 answers your Q1. It state that

If $\nu$ is a finite signed measure and $\mu$ is a positive measure, then $\nu\ll \mu$ iff for any $\varepsilon >0 $ there exists $\delta > 0$ such that $\mu(E)<\delta$ implies $|\nu(E)|<\varepsilon $ for any mesaurable $E$.

Now, if $\mu$ is our probability measure and $F$ is the corresponding CDF, then choosing the following $E = \bigcup_{k=1}^n(t_k,t_{k+1}]$ gives us that $\nu\ll \lambda$ implies that $F$ is absolute continuous (as a function). Here $\lambda$ denote the Lebesgue measure.

Regarding Q2: the density is defined relatively to another measure. Whatever measure $Q$ you take, it always has density w.r.t. itself - please, tell me if this fact is not clear to you. Furthermore, indeed if $P = \lambda$ and $H = \delta_0$ then $Q$ does not admit density w.r.t. $P$, however it clearly admits density w.r.t. $Q$ itself.

In probability theory it may be confusing that most of the time we are talking about densities w.r.t. $\lambda$, so that we do not even mention $\lambda$ and say just "density". For that reason you may forget that we are talking about relative density, as there is no "absolute" density at least in measure theory. There density is exactly RN derivative, hence it requires specifying the "denominator" measure.

Q3: I am not sure what exactly you mean here. If $\nu\ll\mu$ we can define KL divergence by $$ D(\nu,\mu) := \int \log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\nu = \int \frac{\mathrm d\nu}{\mathrm d\mu}\log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\mu \tag{1} $$ and this is defined purely in terms of measures, so it does not depend on their representation through densities.

Regarding your title question, please check out this and that. I'm expecting you'll reconsider and (or) reformulate your question after reading this answer, unless everything already became clear to you. Just come back and we can proceed. And I encourage you to check Folland's book in general.

Added: let's agree on the following - since there is some confusion regarding the notion of the density, we only use terms "function" and "RN derivative". We can define KL divergence $D(\nu,\mu)$ for measures $\nu\ll\mu$ as in $(1)$. We can also fix some reference measure $\psi$ and define a similar map for functional arguments, that is let $$ \bar D_\psi(g,f):= \int g \log\left(\frac gf\right)\mathrm d\psi \tag{1'} $$ for which to be well-defined, we assume that $$ \{f = 0\} \subseteq \{g = 0\} \tag{2}. $$ Now, these two notions are relates as follows: $\bar D_\psi(g,f) = D(\bar\nu,\bar\mu)$ where $$ \bar\nu(\cdot) := \int_{(\cdot)}g\,\mathrm d\psi\qquad \bar\mu(\cdot) := \int_{(\cdot)}f \,\mathrm d\psi $$ and of course $(2)$ implies that $\nu\ll\mu$. So indeed, to talk about the set $\mathcal G$ of all functions $g$ you need to assume that every function from this set satisfies $(2)$: but if you don't assume that the KL divergence would be infinite for those $g$ (you take integral of $\log$ of infinity) so for sure it is greater than $\epsilon$.


Let me also summarize some relations in one-dimensional case. The basic object is the probability measure $\mu:\mathscr B(\Bbb R) \to [0,1]$. Its CDF is a function on real numbers $F_\mu:\Bbb R\to [0,1]$, which is given by $F_\mu(x):=\mu((-\infty,x])$; hence, to each probability measure there corresponds its unique CDF. Vice-versa, from any function satisfying a couple of properties we can construct a probability measure whose CDF is given by the latter function, see e.g. here. Thus, probability measures on real line and CDFs are in one-to-one correspondence, only the former is a function of sets, whereas the latter is the function of real numbers. If $\mu \ll \lambda$ then its RN derivative $f_\mu := \frac{\mathrm d\mu}{\mathrm d\lambda}:\Bbb R \to \Bbb R_+$ is commonly referred to as a density function of $\mu$, however it would be more formal to say that $f_\mu$ is the density of $\mu$ w.r.t. $\lambda$. Notice that $$ F_\mu(x) = \int_{-\infty}^x\mathrm \mu(\mathrm dt) = \int_{-\infty}^xf_\mu(t)\, \lambda(\mathrm dt), $$ hence if $\mu\ll\lambda$, then by LDT we have that $F'_\mu(x)$ exists $\lambda$-a.e. and $F'_\mu(x) = f_\mu(x)$ ($\lambda$-a.e.) For example, if $F_\mu\in C^1(\Bbb R)$ then $F'_\mu$ is a version of the RN derivative $\frac{\mathrm d\mu}{\mathrm d\lambda}$, and by changing $F'_\mu$ on $\lambda$-null sets in any way we can obtain other versions of that RN derivative (since RN derivative is only defined uniquely $\lambda$-a.e.). In fact, in most of the practical cases we compute RN derivatives using usual derivatives; there are not many other methods to compute RN derivatives.